-name:LIRIS UMR5205, Université Claude Bernard Lyon 1, Lyon, France
index:1
date:16 July2020
date:16 September2020
bibliography:paper.bib
---
...
...
@@ -31,7 +31,7 @@ Finding a real estate in a new city is a real challenge. We often arrive in a ci
# Statement of need
Several projects focus on qualifying neighbourhoods using social networks. For instance, the Livehoods project define and compute dynamics of neighbourhoods [@cranshaw2012livehoods] while the Hoodsquare project detects similar areas based on Foursquare check-ins [@zhang2013hoodsquare]. Crowd-based systems are interesting but may be biased. [DataFrance](https://datafrance.info/) is an interface that integrates data from several sources, such as indicators provided by the National Institute of Statistics ([INSEE](https://insee.fr/en/)), geographical information from the National Geographic Institute ([IGN](http://www.ign.fr/institut/activites/geoservices-ign)) and surveys from newspapers for prices (L'Express). DataFrance enables the visualization of hundreds of indicators, but makes it difficult to judge on the environment of a neighbourhood.
Several projects focus on qualifying neighbourhoods using social networks. For instance, the Livehoods project define and compute dynamics of neighbourhoods [@cranshaw2012livehoods] while the Hoodsquare project detects similar areas based on Foursquare check-ins [@zhang2013hoodsquare]. Crowd-based systems are interesting but may be biased. [DataFrance](https://datafrance.info/) is an interface that integrates data from several sources, such as indicators provided by the National Institute of Statistics ([INSEE](https://insee.fr/en/)), geographical information from the National Geographic Institute ([IGN](http://www.ign.fr/institut/activites/geoservices-ign)) and surveys from newspapers for prices (L'Express). DataFrance enables the visualization of hundreds of indicators, but makes it difficult to judge on the environment of a neighbourhood.
# Methodology
...
...
@@ -66,14 +66,14 @@ To perform prediction, a selection process first selects subsets of relevant ind
## Adding new algorithms
Because the prediction of these variables is a complex task, we have to test several algorithms to compare results. In order to facilitate the tuning and the using of the algorithms, Predihood proposes a generic and easy-to-use interface for algorithms. This interface is based on [Scikit-learn](https://scikit-learn.org/stable/) algorithms but can handle hand-made ones. To implement your own algorithm and test it on our dataset, follow these steps:
Because the prediction of these environment variables is a complex task, testing different algorithms and comparing their results may help increase the overall quality. In order to facilitate this task, Predihood proposes a generic and easy-to-use programming structure for machine learning algorithms, based on [Scikit-learn](https://scikit-learn.org/stable/) algorithms. Thus, experts can implement hand-made algorithms and run experiments in Predihood. Adding a new algorithm only requires 4 steps:
1. Create a new class that represents your algorithm, e.g. `MyOwnClassifier`, and inherits from `Classifier`.
2. Implement the core of your algorithm by coding `fit()` and `predict()` functions. The `fit` function aims at fitting your classifier on assessed neighbourhoods while the `predict` function aims at predicting environment variables for a given neighbourhood.
2. Implement the core of your algorithm by coding the `fit()` and `predict()` functions. The `fit` function aims at fitting your classifier on assessed neighbourhoods while the `predict` function aims at predicting environment variables for a given neighbourhood.
3. Add `get_params()` to be compatible with Scikit-learn framework.
5. Comment your classifier with the Numpy style in order to be able to tune it in the interface.
4. Comment your classifier with the Numpy style in order to be able to tune it in the interface.
Below is a very simple example to illustrate the aforementioned steps.
Below is a very simple example to illustrate the aforementioned steps. Note that your algorithm is automatically loaded in Predihood.
```python
# file ./algorithms/MyOwnClassifier.py
...
...
@@ -105,16 +105,13 @@ class MyOwnClassifier(Classifier):
return{"a":self.a,"b":self.b}
```
After that, your algorithm is ready to be used in Predihood.
Figure 3 shows the generic interface of Predihood for tuning algorithms. The left panel allows to tune parameters and hyper parameters, such as training and test sizes. On the right, the table illustrates the accuracies obtained for each list (generated during the selection process) and each environment variable. You can export these results by clicking on the download icon.
In addition, Predihood provides an interface for easily tuning and testing algorithms on a dataset, as shown in Figure 3. The left panel allows to select an algorithm and tune its parameters and hyper parameters, such as training and test sizes. On the right, the table illustrates the accuracies obtained for each list of indicators (generated during the selection process) and each environment variable. Results can be exported in CSV.

# Mentions of Predihood
Our approach Predihood has been presented during the DATA conference [@barretpredicting].
Results vary from 30% to 65% depending on the environment variable, but proposing new algorithms can help to improve these results.
Our Predihood tool has been presented during the DATA conference [@barretpredicting]. Prediction results using 6 algorithms from Scikit-learn range from 30% to 65% depending on the environment variable, and designing new algorithms could help improving these results.
The project is available here: [https://gitlab.liris.cnrs.fr/fduchate/predihood](https://gitlab.liris.cnrs.fr/fduchate/predihood).