diff --git a/README.md b/README.md index bc371be1222ac20a56f289f55a4c4d3c91a8e0fb..499cc73047f58096bc46ba407cec3cad3f54ad17 100755 --- a/README.md +++ b/README.md @@ -23,7 +23,7 @@ Go in the downloaded `predihood/` directory (which contains `setup.py`) and run python3 -m pip install -e . -r requirements.txt ``` -This command install dependencies, including [mongiris](https://gitlab.liris.cnrs.fr/fduchate/mongiris), a lightweight API which enables the querying of the MongoDB database containing information about French neighbourhoods. +This command install dependencies, including [mongiris](https://gitlab.liris.cnrs.fr/fduchate/mongiris), a lightweight API which enables the querying of the MongoDB database containing information about French neighbourhoods. Note that the download time may be quite long, as the mongiris API includes a dump of French neighbourhoods (700 MB). Next, to install the database, run the MongoDB server and execute this command (from the MongoDB's executables directory if needed): diff --git a/paper.md b/paper.md index 0328f09c6804d6cf6dac8cd83180fe4faaf54a21..aa2e1f925375bcc9878a3e1c8bd4308013ec83e2 100644 --- a/paper.md +++ b/paper.md @@ -31,7 +31,7 @@ Finding a real estate in a new city is a real challenge. We often arrive in a ci # Statement of need -Several projects focus on qualifying neighbourhoods using social networks. For instance, the Livehoods project define and compute dynamics of neighbourhoods [@cranshaw2012livehoods] while the Hoodsquare project detects similar areas based on Foursquare check-ins [@zhang2013hoodsquare]. Crowd-based systems are interesting but may be biased. [DataFrance](https://datafrance.info/) is an interface that integrates data from several sources, such as indicators provided by the National Institute of Statistics ([INSEE](https://insee.fr/en/)), geographical information from the National Geographic Institute ([IGN](http://www.ign.fr/institut/activites/geoservices-ign)) and surveys from newspapers for prices (L'Express). DataFrance enables the visualization of hundreds of indicators, but makes it difficult to judge on the environment of a neighbourhood. +Several projects focus on qualifying neighbourhoods using social networks. For instance, the Livehoods project defines and computes dynamics of neighbourhoods [@cranshaw2012livehoods] while the Hoodsquare project detects similar areas based on Foursquare check-ins [@zhang2013hoodsquare]. Crowd-based systems are interesting but may be biased. [DataFrance](https://datafrance.info/) is an interface that integrates data from several sources, such as indicators provided by the National Institute of Statistics ([INSEE](https://insee.fr/en/)), geographical information from the National Geographic Institute ([IGN](http://www.ign.fr/institut/activites/geoservices-ign)) and surveys from newspapers for prices (L'Express). DataFrance enables the visualization of hundreds of indicators, but makes it difficult to judge on the environment of a neighbourhood. There is no simple description of neighbourhood's environment. # Methodology @@ -46,21 +46,21 @@ Predihood provides the following functionnalities: Neighbourhoods are represented as [GeoJSON objects](https://geojson.org/) and include: -- a geometry (multi-polygons), which describe the shape of the neighbourhood; +- a geometry (multi-polygons), which describes the shape of the neighbourhood; - properties, with descriptive information (e.g., name, city postcode) and indicators which quantify the environment (e.g., number of restaurants, of bakeries, average income, unemployment rate, number of houses with a superficy above 250 $m^2$). These hundreds of indicators are used for predicting the values of environment variables. To add new neighbourhoods, it is necessary to store them as GeoJSON and make them accessible by Predihood. Besides, some neighbourhoods have to be manually annotated (i.e., giving a value for each of the six environment variables). -The current version of Predihood is bundled with data from France using the [mongiris](https://gitlab.liris.cnrs.fr/fduchate/mongiris) project. -It includes about 50,000 neighbourhoods with 640 indicators, and 300 neighbouhoods were annotated by social science researchers (one to two hours per neighbourhood to investigate building and streets pictures, parked cars, facilities and greens areas from services such as Google Street View). +The current version of Predihood is bundled with data from France using the [mongiris](https://gitlab.liris.cnrs.fr/fduchate/mongiris) project (in which unit division named IRIS stand for neighbourhoods). +It includes about 50,000 neighbourhoods with 640 indicators, and 270 neighbouhoods were annotated by social science researchers (one to two hours per neighbourhood to investigate building and streets pictures, parked cars, facilities and green areas from services such as Google Street View). ## Predicting environment -Machine learning algorithms need a dataset, as illustrated by Figure 1. In Predihood, a dataset is composed of the identifier of the neighbourhood (grey column, named code INSEE), its indicators (yellow columns) that have been normalized by density of population (green column) and optionnaly the assessment of social science researchers for the six environment variables (blue columns). The objective of Predihood is to fill automatically question marks for neighbourhoods that are not yet assessed. +Machine learning algorithms need a dataset, as illustrated by Figure 1. In Predihood, a dataset is composed of the identifier of the neighbourhood (grey column), its indicators (yellow columns, showing only a subset) that have been normalized by density of population (green column) and optionnaly the assessment of social science researchers for the six environment variables (blue columns). The objective of Predihood is to fill automatically question marks for neighbourhoods that are not yet assessed. - + -To perform prediction, a selection process first selects subsets of relevant indicators. These subsets, called _lists_, contain from 10 to 100 indicators. Predihood provides a cartographic interface based on [Leaflet](https://leafletjs.com/) and [Open Street Map](https://www.openstreetmap.org/), as shown in Figure 2. It enables to search for a neighbourhood and predict its environment by selecting an algorithm. The current version of Predihood currently includes 8 predictive algorithms from [scikit-learn](https://scikit-learn.org/) (e.g., Random Forest). +To perform prediction, a selection process first selects subsets of relevant indicators. These subsets, called _lists_, contain from 10 to 100 indicators. Predihood provides a cartographic web interface based on [Leaflet](https://leafletjs.com/) and [Open Street Map](https://www.openstreetmap.org/), as shown in Figure 2. It enables to search for a neighbourhood and predict its environment by selecting an algorithm. The current version of Predihood currently includes 8 predictive algorithms from [scikit-learn](https://scikit-learn.org/) (e.g., Random Forest).  @@ -97,7 +97,7 @@ class MyOwnClassifier(Classifier): def fit(self, X, y): # do stuff here - def predict(self, X): + def predict(self, df): # do stuff here def get_params(self, deep=True): diff --git a/predihood-accuracies.png b/predihood-accuracies.png index f927c82a25287930ca23bdf0c14ca4276cb93c84..b20d698aa34235cc9f8f3f7734298ebdb4415b26 100644 Binary files a/predihood-accuracies.png and b/predihood-accuracies.png differ diff --git a/predihood/algorithms/MyOwnClassifier.py b/predihood/algorithms/MyOwnClassifier.py index fcb5dc982e382cc096645986389cd454f3277a6e..c8098ac8caf084b34ae4bf3fc694da1b076119b3 100644 --- a/predihood/algorithms/MyOwnClassifier.py +++ b/predihood/algorithms/MyOwnClassifier.py @@ -18,7 +18,7 @@ class MyOwnClassifier(Classifier): def fit(self, X, y): return self - def predict(self, X): + def predict(self, df): return self def get_params(self, deep=True): diff --git a/setup.py b/setup.py index 8cc17a65f197b321cc623b3a005cc10bbc1f5be2..90fb308659a363a0c39aa15b518d1aa79e4a0ace 100755 --- a/setup.py +++ b/setup.py @@ -5,6 +5,9 @@ import shutil # python3 -m setup bdist_wheel sdist # python3 -m pip install -e . -r requirements.txt +print("Starting to install predihood and its dependcies.") +print("Note that this installation may take time due to the size of the mongiris dependency (700 MB).") + def delete_dir(directories): # delete directories (ignore errors such as read-only files) for directory in directories: shutil.rmtree(directory, ignore_errors=True)