Skip to content
Snippets Groups Projects
Commit aca3bfc6 authored by Nelly Barret's avatar Nelly Barret
Browse files

[AM] readme in english + paper

parent b48115af
No related branches found
No related tags found
No related merge requests found
---
title: 'Predihood: an open-source tool for describing and predicting neighbourhoods' environment'
title: "Predihood: an open-source tool for describing and predicting neighbourhoods' environment"
tags:
- Python
- MongoDB
......@@ -27,24 +27,23 @@ bibliography: paper.bib
---
# Statement of need
# Statement of need 1
Finding a real estate in a new city is still a challenge. We often arrive in a city we don't know, thus finding the perfect living place becomes complex. Nearby public transport on one hand, a rural landscape on the other hand, an animated neighbourhood for some, far from urban hustle and bustle for others: there are many criteria for choosing your future neighbourhood.
Some projects have been focused on qualifying neighbourhoods, such as Livehoods [@cranshaw2012livehoods] and Hoodsquare [@zhang2013hoodsquare]. The Livehoods project aims at defining and computing dynamics of neighbourhoods based on data gathered from social networks data while the Hoodsquare project detects similar areas based on Foursquare check-ins. Regarding a lot of papers about these challenges, our contribution differs on several points. Numerous works are limited to a few cities, some others introduce bias by using social networks and finally, the majority of works are focusing on life quality. Contrary to existing works, our approach works for a whole country (namely in France), is based on reliable and frequently updated sources and a social study and is focused on the environment of neighbourhoods. Moreover, each neighbourhood can be described by thousands of indicators. Besides, it is not possible to manually exploit these indicators and we don't have a global view of the principal characteristics of the neighbourhood.
In order to describe in the most accurate way the environment of a neighbourhood, social science researchers have defined six environment variables with a limited number of values for each one. These six variables are the _building type_, the _building usage_, the _landscape_, the _social class_, the _morphological position_ and the _geographical position_. As an example, the _landscape_ can be evaluated as _urban_, _green areas_, _forest_ or _countryside_ while the _social class_ have values from _lower_ to _upper_. These variables are commonly accepted and easy to understand and use. There is still a challenge about describing each neighbourhood in a whole country with these six variables. To tackle this challenge, our objective is to predict by supervised learning the environment variables whatever the neighbourhood.
Some projects have been focused on qualifying neighbourhoods, such as Livehoods [@cranshaw2012livehoods] and Hoodsquare [@zhang2013hoodsquare]. The Livehoods project aims at defining and computing dynamics of neighbourhoods based on data gathered from social networks while the Hoodsquare project detects similar areas based on Foursquare check-ins. Regarding a lot of papers about these challenges, our contribution differs on several points. Numerous works are limited to a few cities, some others introduce bias by using social networks and finally, the majority of works are focusing on life quality. Contrary to existing works, our approach works for a whole country (namely in France), is based on reliable and frequently updated sources and a social study and is focused on the environment of neighbourhoods.
In order to describe in the most accurate way the environment of a neighbourhood, social science researchers have defined six environment variables with a limited number of values for each one. These six variables are the _building type_, the _building usage_, the _landscape_, the _social class_, the _morphological position_ and the _geographical position_. As an example, the _landscape_ can be evaluated as _urban_, _green areas_, _forest_ or _countryside_ while the _social class_ have values from _lower_ to _upper_. These variables are commonly accepted and easy to understand and use. There is still a challenge about describing each neighbourhood in a whole country with these six variables. To tackle this, our objective is to predict by supervised learning the environment variables whatever the neighbourhood.
# Methodology
In order to predict the environment of neighbourhoods, we have to gather data about neighbourhoods. There are mainly two types of data: the geometry which describe the shape of the neighbourhood and indicators that quantify the environment. For example, there are the number of restaurants, the average income or even the number of houses over 250 $$m^2$$. Predihood integrates such data for France by using [mongiris](https://gitlab.liris.cnrs.fr/fduchate/mongiris), an interface for querying French administrative areas. There are only data about French areas, but this can be extended to other countries.
For predicting the environment of neighbourhoods, we have to gather data about them. There are mainly two types of data: the geometry which describe the shape of the neighbourhood and indicators that quantify the environment. Each neighbourhood can be described by thousands of indicators. Even if it is not possible to manually exploit these indicators, they are useful in an automatic approach. For example, there are the number of restaurants, the average income or even the number of houses over 250 $m^2$. Predihood integrates such data for France by using [mongiris](https://gitlab.liris.cnrs.fr/fduchate/mongiris), an interface for querying French administrative areas. There are only data about French areas, but this can be extended to other countries.
After gathering data, the next step is to assess some neighbourhoods **because of** the supervised learning approach. This manual assessment has been realized by social science researchers. This have been done by investigating Google Street View (building and streets pictures, parked cars, facilities and greens areas) and requires between one to two hours for a single neighbourhood. A total of 300 IRIS have been annotated, which will be used as training data.
After gathering data, the next step is to assess some neighbourhoods because of the supervised learning approach. This manual assessment has been realized by social science researchers. This have been done by investigating Google Street View (building and streets pictures, parked cars, facilities and greens areas) and requires between one to two hours for a single neighbourhood. A total of 300 IRIS have been annotated, which will be used as training data.
In order to unify the view between assessed neighbourhoods and their indicators, datasets have been constructed. They look like Figure 1 and are composed of the code INSEE of the neighbourhood, its indicators that have been normalized by density of population and the expertise of social science researchers for the six environment variables.
In order to unify the view between assessed neighbourhoods and their indicators, datasets have been constructed. They look like Figure 1 and are composed of the code INSEE of the neighbourhood (grey column), its indicators (yellow columns) that have been normalized by density of population (green column) and the assessment of social science researchers for the six environment variables (blue columns). Our approach Predihood aims at automatically filling question marks for neighbourhoods that are not yet assessed.
![Screenshot of Predihood](predihood-indicators.png)
![An example of the computed dataset.](predihood-indicators.png)
It is now possible to predict the environment of any neighbourhood in France using our unified dataset. Because neighbourhoods are represented by hundreds of indicators, a selection process selects subsets of relevant indicators. These subsets are called _lists_ and contain from 10 to 100 indicators. They are used in the Predihood interface to predict environment.
Predihood proposes a generic interface for tuning algorithms more easily. This interface is based on [Scikit-learn](https://scikit-learn.org/stable/) algorithms but can handle hand-made ones. To implement your own algorithm and test it on our dataset, follow these steps:
......@@ -83,7 +82,7 @@ class NewAlgorithm(Classifier):
return self
```
After that, your algorithm is ready to be used in Predihood.
After that, your algorithm is ready to be used in Predihood.
# Mentions of Predihood
......@@ -91,11 +90,11 @@ Our approach Predihood has been presented during the DATA conference [@barretpre
This first screenshot shows the generic interface of Predihood for tuning algorithms. The left panel allows to tune parameters and hyper parameters, such as training and test sizes. On the right, the table illustrates the accuracies obtained for each lists (generated during the selection process) and each environment variable. You can export these results by clicking on the download icon.
![Screenshot of Predihood](predihood-accuracies.png)
![Screenshot of algorithm interface of Predihood](predihood-accuracies.png)
This screenshot exposes the cartographic interface of Predihood, used mostly by people who search for a new living place. By searching an area in the inputs on the left and then clicking on neighbourhoods, you will be able to choose an algorithm to predict environment variables of the chosen neighbourhood. For beginners, `Random Forest` classifier is recommended. For example, Alice is an IT commercial and has been recruited for a mission in Lyon for 6 months before going back to Paris. She compares easily many neighbourdhoods in the CBD (Central Business District) of Lyon and chooses the "Part-Dieu" neighbourhood.
![Screenshot of Predihood](predihood-predictions.png)
![Screenshot of the cartographic interface of Predihood](predihood-predictions.png)
# Acknowledgements
......
predihood-indicators.png

557 KiB | W: | H:

predihood-indicators.png

560 KiB | W: | H:

predihood-indicators.png
predihood-indicators.png
predihood-indicators.png
predihood-indicators.png
  • 2-up
  • Swipe
  • Onion skin
......@@ -59,6 +59,12 @@ footer {
height: 20vh;
}
.right {
display: flex;
justify-content: right;
height: 20vh;
}
h3 {
margin-top: 1em;
margin-bottom: 1em;
......
......@@ -2,13 +2,17 @@
{% block content %}
<div class="row">
<h3 style="margin-top: 0 !important; margin-bottom: 0 !important">
<a href="/">
<img src="{{ url_for('static', filename='img/favicon.png') }}" height="30vh">
</a>
predihood
</h3>
<button type="button" data-toggle="modal" data-target="#modalHelp" class="btn btn-primary float-right">Help</button>
<div class="col-11">
<h3 style="margin-top: 0 !important; margin-bottom: 0 !important">
<a href="/">
<img src="{{ url_for('static', filename='img/favicon.png') }}" height="30vh">
</a>
predihood
</h3>
</div>
<div class="col-1" align="right">
<input type="button" value="Help" data-toggle="modal" data-target="#modalHelp" class="btn btn-primary">
</div>
</div>
<div class="row">
<div class="col-6" id="divSearch">
......
# Predihood
Predihood is an application for visualizing [IRIS](https://www.insee.fr/fr/metadonnees/definition/c1523) (administrative areas defined by the French institute of statistics, they can be considered as neighbourhoods) and indicators which describe them (e.g. number of bakeries, average income and even the number of houses over 250m^2).
## Statement of need
Predihood proposes an interface for searching and comparing neighbourhoods.
## Installation instructions
### Requirements
- Python, version >=3
- [MongoDB](https://www.mongodb.com/), version >=4 for importing the database about neighbourhoods.
### Installation
For installing Predihood, type in a terminal:
```
python3 -m pip install -e predihood/ --process-dependency-links
```
This command install dependencies, including [mongiris](https://gitlab.liris.cnrs.fr/fduchate/mongiris) which provide the querying of the MongoDB database containing information about neighbourhoods.
Create this database is mandatory. To achieve this, execute this command (from the MongoDB's executables directory if needed):
```
./mongorestore --archive=/path/to/dump-iris.bin
```
where `/path/to/` is the path to the dump file of the IRIS collection (provided with the package mongiris in `mongiris/data/dump-iris.bin`).
### Run the interface
For running *Predihood*, type in a terminal:
```
python3 main.py
```
After some information, the terminal display the URL for testing *Predihood* : `http://localhost:8080/`. If you want to try the cartographic interface, click on the button "Search a neighbourhood". Otherwise, if you want to configure and test your algorithm in our interface, click on the button "Tune my classifier".
## Example usage
For the cartographic interface, an example would be:
1. Type a query in the panel on the left, e.g. "Lyon". This will display all neighbourhoods that contain "Lyon" in their name or their township.
2. Click on a neighbourhood (which are the small areas in blue). A tooltip will appear with some information about the neighbourhood. There are more informations when clicking on the "More details" link.
3. In order to predict the environment variables, you have to choose the classifier. The "Random Forest" classifier is recommended by default. After some seconds, predictions will appear in the tooltip. This will help you for comparing neighbourhoods between them.s
For the algorithmic interface, an example would be:
1. Choose an algorithm
## Community guideline
## Functionality
## Tests
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment