From b9418bb2d12c3fd855714e4dbfafb499d991246a Mon Sep 17 00:00:00 2001 From: Duchateau Fabien <fabien.duchateau@univ-lyon1.fr> Date: Sat, 22 Aug 2020 15:41:53 +0200 Subject: [PATCH] [M] wrote methodology + subsection 1 --- paper.md | 31 +++++++++++++++++-------------- 1 file changed, 17 insertions(+), 14 deletions(-) diff --git a/paper.md b/paper.md index ea6b0df2..5ab9c27c 100644 --- a/paper.md +++ b/paper.md @@ -27,35 +27,38 @@ bibliography: paper.bib # Introduction -Finding a real estate in a new city is a real challenge. We often arrive in a city we do not know, and finding the perfect living area becomes complex. Nearby public transportation on one hand, rural landscape on the other hand, an animated neighbourhood for some, far from urban hustle and bustle for others: there are many criteria for choosing a future neighbourhood. +Finding a real estate in a new city is a real challenge. We often arrive in a city we do not know, and finding the perfect living area becomes complex. Nearby public transportation on one hand, rural landscape on the other hand, an animated neighbourhood for some, far from urban hustle and bustle for others: there are many criteria for choosing a future neighbourhood. Our tool Predihood enables to define neighbourhoods with a set of indicators and predict their environment using supervised learning. # Statement of need -Several projects focus on qualifying neighbourhoods. The Livehoods project aims at defining and computing dynamics of neighbourhoods based on data gathered from social networks [@cranshaw2012livehoods]. The Hoodsquare project detects similar areas based on Foursquare check-ins [@zhang2013hoodsquare]. Crowd-based systems are interesting but may be biased. [DataFrance](https://datafrance.info/) is an interface that integrates data from several sources, such as indicators provided by the National Institute of Statistics ([INSEE](https://insee.fr/en/)), geographical information from the National Geographic Institute ([IGN](http://www.ign.fr/institut/activites/geoservices-ign)) and surveys from newspapers for prices (L'Express). DataFrance enables the visualization of hundreds of indicators, but makes it difficult to judge on the environement of a neighbourhood. - +Several projects focus on qualifying neighbourhoods using social networks. For instance, the Livehoods project define and compute dynamics of neighbourhoods [@cranshaw2012livehoods] while the Hoodsquare project detects similar areas based on Foursquare check-ins [@zhang2013hoodsquare]. Crowd-based systems are interesting but may be biased. [DataFrance](https://datafrance.info/) is an interface that integrates data from several sources, such as indicators provided by the National Institute of Statistics ([INSEE](https://insee.fr/en/)), geographical information from the National Geographic Institute ([IGN](http://www.ign.fr/institut/activites/geoservices-ign)) and surveys from newspapers for prices (L'Express). DataFrance enables the visualization of hundreds of indicators, but makes it difficult to judge on the environment of a neighbourhood. # Methodology -Our approach Predihood aims at facilitating the comparison between neighbourhoods. It defines and predicts the environment of any neighbourhood in France using supervised learning. +In order to describe in the most accurate way the environment of a neighbourhood, social science researchers have defined six environment variables, each with a limited number of values [@barretpredicting]. These six variables are the _building type_, the _building usage_, the _landscape_, the _social class_, the _morphological position_ and the _geographical position_. As an example, the _landscape_ can be evaluated as _urban_, _green areas_, _forest_ or _countryside_ while the _social class_ have values from _lower_ to _upper_. These variables are commonly accepted and easily understandable. -contributions : either for adding new data (neighbourhoods from another country) or for adding predictive algorithms +Predihood provides the following functionnalities: +- adding new neighbourhoods and indicators to describe them; +- predict the environment of a neighbourhood by configuring and using predefined algorithms; +- adding new predictive algorithms. -## Describing neighbourhoods +## Adding new data -In order to describe in the most accurate way the environment of a neighbourhood, social science researchers have defined six environment variables with a limited number of values for each one. These six variables are the _building type_, the _building usage_, the _landscape_, the _social class_, the _morphological position_ and the _geographical position_. As an example, the _landscape_ can be evaluated as _urban_, _green areas_, _forest_ or _countryside_ while the _social class_ have values from _lower_ to _upper_. These variables are commonly accepted and easily understandable. +Neighbourhoods are represented as [GeoJSON objects](https://geojson.org/) and include: -## Predicting neighbourhoods +- a geometry (multi-polygons), which describe the shape of the neighbourhood; +- properties, with descriptive information (e.g., name, city postcode) and indicators which quantify the environment (e.g., number of restaurants, of bakeries, average income, unemployment rate, number of houses with a superficy above 250 $m^2$). These hundreds of indicators are used for predicting the values of environment variables. -There are mainly four steps: producing supervised neighbourhoods, collecting data about neighbourhoods, compute datasets and finally running algorithms to predict environment. +To add new neighbourhoods, it is necessary to store them as GeoJSON and make them accessible by Predihood. Besides, some neighbourhoods have to be manually annotated (i.e., giving a value for each of the six environment variables). -The first step of producing supervised neighbourhoods is a manual task that has been done by social science researchers. This task consists of giving a value for each environment variable. This has been done by investigating Google Street View (building and streets pictures, parked cars, facilities and greens areas) and requires between one to two hours for a single neighbourhood. A total of 300 neighbourhoods have been annotated. +The current version of Predihood is bundled with data from France using the [mongiris](https://gitlab.liris.cnrs.fr/fduchate/mongiris) project. +It includes about 50,000 neighbourhoods with 640 indicators, and 300 neighbouhoods were annotated by social science researchers (one to two hours per neighbourhood to investigate building and streets pictures, parked cars, facilities and greens areas from services such as Google Street View). -The second step is about collecting data that represents neighbourhoods. There are mainly two types of data: +## Predicting neighbourhoods + +There are mainly four steps: producing supervised neighbourhoods, collecting data about neighbourhoods, compute datasets and finally running algorithms to predict environment. -- The geometry, stored as a GeoJSON object, which describe the shape of the neighbourhood. -- The indicators which quantify the environment. Each neighbourhood can be described by thousands of indicators, such as the number of restaurants, the average income or even the number of houses over 250 $m^2$. Even if it is not possible to manually exploit these indicators, they are useful in an automatic approach. -Predihood integrates such data for France by using [mongiris](https://gitlab.liris.cnrs.fr/fduchate/mongiris), an interface for querying French administrative areas. Predihood is instantiated only for French data, but this can be easily extended to other countries. The third step aims at computing datasets that will aggregate aforementioned data. A dataset looks like Figure 1 and is composed of the code INSEE of the neighbourhood (grey column), its indicators (yellow columns) that have been normalized by density of population (green column) and the assessment of social science researchers for the six environment variables (blue columns). As a reminder, our approach Predihood aims at automatically filling question marks for neighbourhoods that are not yet assessed. -- GitLab