diff --git a/README.md b/README.md index 4c390ac1e18dbaab1b78a6be94395919ca8659ba..198ce18b1d3f7eea05c7d9f7ff0a0cd813ad2e45 100644 --- a/README.md +++ b/README.md @@ -1,47 +1,60 @@ -#  predihood +# Predihood -Cette application permet de visualiser les [IRIS](https://www.insee.fr/fr/metadonnees/definition/c1523) (zones administratives définies par l'INSEE, un peu similaires aux quartiers, environ 50000 IRIS sur le terrtitoire français) et les indicateurs qui les décrivent (e.g., nombre de boulangeries, nombre et type d'établissements scolaires, pourcentage d'habitant.e.s selon les catégories socio-professionnelles). +Predihood is an application for visualizing [IRIS](https://www.insee.fr/fr/metadonnees/definition/c1523) (administrative areas defined by the French institute of statistics, they can be considered as neighbourhoods) and indicators which describe them (e.g. number of bakeries, average income and even the number of houses over 250m^2). -L'outil *mapiris* permet de chercher les IRIS par code, par nom (d'IRIS ou de commune) et d'afficher les IRIS sur une zone géographique donnée. +## Statement of need -<img src="/predihood/static/img/screenshot-mapiris.jpg?raw=true" alt="Capture mapiris" width="100%"> +Finding a real estate in a new city is still a challenge. We often arrive in a city we don't know, thus finding the perfect living place becomes complex. Nearby public transport on one hand, a rural landscape on the other hand, an animated neighbourhood for some, far from urban hustle and bustle for others: there are many criteria for choosing your future neighbourhood. Our approach Predihood aims at facilitating the comparison between neighbourhoods. It defines and predicts the environment of any neighbourhood in France using supervised learning. -## Pré-requis +## Installation instructions +### Requirements - Python, version >=3 -- [MongoDB](https://www.mongodb.com/), version >=4, pour lequel il faudra importer la base de données des IRIS (cf installation). +- [MongoDB](https://www.mongodb.com/), version >=4 for importing the database about neighbourhoods. -## Installation +### Installation -Pour installer *predihood*, taper dans un terminal : +For installing Predihood, type in a terminal: ``` python3 -m pip install -e predihood/ --process-dependency-links ``` -Cette commande installe les dépendances, dont [mongiris](https://gitlab.liris.cnrs.fr/fduchate/mongiris) qui permet l'interrogation d'une base de données sous MongoDB contenant les IRIS. +This command install dependencies, including [mongiris](https://gitlab.liris.cnrs.fr/fduchate/mongiris) which provide the querying of the MongoDB database containing information about neighbourhoods. -Il est donc nécessaire de crééer la base de données avec la collection d'IRIS. Pour cela, exécuter la commande (depuis le répertoire des exécutables de MongoDB si besoin) : +Create this database is mandatory. To achieve this, execute this command (from the MongoDB's executables directory if needed): ``` ./mongorestore --archive=/path/to/dump-iris.bin ``` -où `/path/to/` représente le chemin vers le fichier dump de la collection des IRIS (fourni de base avec le package mongiris dans `mongiris/data/dump-iris.bin`) +where `/path/to/` is the path to the dump file of the IRIS collection (provided with the package mongiris in `mongiris/data/dump-iris.bin`). -## Lancement de l'interface +### Run Predihood -Pour lancer *predihood*, taper dans un terminal : +For running *Predihood*, type in a terminal: ``` python3 main.py ``` -Après quelques informations, le terminal affiche l'URL permettant de tester *predihood* : `http://localhost:8080/` +After some information, the terminal display the URL for testing *Predihood* : `http://localhost:8080/`. If you want to try the cartographic interface, click on the button "Search a neighbourhood". Otherwise, if you want to configure and test your algorithm in our interface, click on the button "Tune my classifier". -## Crédits +## Example usage -Source de données : [INSEE](https://www.insee.fr/) +For the cartographic interface, an example would be: -Contributeurs : laboratoire [LIRIS](https://liris.cnrs.fr/), laboratoire [CMW](https://www.centre-max-weber.fr/) et Labex [Intelligence des Mondes Urbains (IMU)](http://imu.universite-lyon.fr/projet/hil) +1. Type a query in the panel on the left, e.g. "Lyon". This will display all neighbourhoods that contain "Lyon" in their name or their township. +2. Click on a neighbourhood (which are the small areas in blue). A tooltip will appear with some information about the neighbourhood. There are more informations when clicking on the "More details" link. +3. In order to predict the environment variables, you have to choose the classifier. The "Random Forest" classifier is recommended by default. After some seconds, predictions will appear in the tooltip. This will help you for comparing neighbourhoods between them.s +For the algorithmic interface, an example would be: + +1. Choose an algorithm +2. Tune it as desired +3. Click on "Train, test and evaluate" button. When computing accuracies is done, a table shows results for each environment variable and each list of indicators. + + +## Tests + +Tests are in `tests.py` file. \ No newline at end of file diff --git a/paper.bib b/paper.bib index 8c255ae6922a6c853320dcad89fc18199fc0d9bb..76d214a4967e5e59bbcbed3469f39533283920d2 100644 --- a/paper.bib +++ b/paper.bib @@ -17,5 +17,6 @@ @article{barretpredicting, title={Predicting the enviornment of a neighbourhood: a use case for France}, - author={Barret, Nelly and Duchateau, Fabien and Favetta, Franck and Bonneval, Loic} + author={Barret, Nelly and Duchateau, Fabien and Favetta, Franck and Bonneval, Loic}, + year={2020} } \ No newline at end of file diff --git a/paper.md b/paper.md index 2911a9e920fc72d571ec2ce689b0d846a8388726..b723d20890c1638fc050e183736df8f34778f6fd 100644 --- a/paper.md +++ b/paper.md @@ -13,47 +13,64 @@ authors: affiliation: 1 - name: Fabien Duchateau orcid: 0000-0001-6803-917X - affiliation: 2 + affiliation: 1 - name: Franck Favetta orcid: 0000-0003-2039-3481 - affiliation: 2 + affiliation: 1 affiliations: - - name: Université Claude Bernard Lyon 1, Lyon, France - index: 1 - name: LIRIS UMR5205, Université Claude Bernard Lyon 1, Lyon, France - index: 2 + index: 1 date: 16 July 2020 bibliography: paper.bib --- -# Statement of need 1 +# Introduction + +Finding a real estate in a new city is still a challenge. We often arrive in a city we don't know, thus finding the perfect living place becomes complex. Nearby public transport on one hand, a rural landscape on the other hand, an animated neighbourhood for some, far from urban hustle and bustle for others: there are many criteria for choosing your future neighbourhood. -Finding a real estate in a new city is still a challenge. We often arrive in a city we don't know, thus finding the perfect living place becomes complex. Nearby public transport on one hand, a rural landscape on the other hand, an animated neighbourhood for some, far from urban hustle and bustle for others: there are many criteria for choosing your future neighbourhood. +# Statement of need -Some projects have been focused on qualifying neighbourhoods, such as Livehoods [@cranshaw2012livehoods] and Hoodsquare [@zhang2013hoodsquare]. The Livehoods project aims at defining and computing dynamics of neighbourhoods based on data gathered from social networks while the Hoodsquare project detects similar areas based on Foursquare check-ins. Regarding a lot of papers about these challenges, our contribution differs on several points. Numerous works are limited to a few cities, some others introduce bias by using social networks and finally, the majority of works are focusing on life quality. Contrary to existing works, our approach works for a whole country (namely in France), is based on reliable and frequently updated sources and a social study and is focused on the environment of neighbourhoods. +Some projects focuses on qualifying neighbourhoods, such as Livehoods [@cranshaw2012livehoods], Hoodsquare [@zhang2013hoodsquare] and [DataFrance](https://datafrance.info/). The Livehoods project aims at defining and computing dynamics of neighbourhoods based on data gathered from social networks. The Hoodsquare project detects similar areas based on Foursquare check-ins. DataFrance is an interface that integrates data from several sources, such as indicators provided by the National Institute of Statistics ([INSEE](https://insee.fr/en/accueil)), geographical information from the National Geographic Institute ([IGN](http://www.ign.fr/institut/activites/geoservices-ign)) and surveys from newspapers for prices (L'Express). -In order to describe in the most accurate way the environment of a neighbourhood, social science researchers have defined six environment variables with a limited number of values for each one. These six variables are the _building type_, the _building usage_, the _landscape_, the _social class_, the _morphological position_ and the _geographical position_. As an example, the _landscape_ can be evaluated as _urban_, _green areas_, _forest_ or _countryside_ while the _social class_ have values from _lower_ to _upper_. These variables are commonly accepted and easy to understand and use. There is still a challenge about describing each neighbourhood in a whole country with these six variables. To tackle this, our objective is to predict by supervised learning the environment variables whatever the neighbourhood. # Methodology -For predicting the environment of neighbourhoods, we have to gather data about them. There are mainly two types of data: the geometry which describe the shape of the neighbourhood and indicators that quantify the environment. Each neighbourhood can be described by thousands of indicators. Even if it is not possible to manually exploit these indicators, they are useful in an automatic approach. For example, there are the number of restaurants, the average income or even the number of houses over 250 $m^2$. Predihood integrates such data for France by using [mongiris](https://gitlab.liris.cnrs.fr/fduchate/mongiris), an interface for querying French administrative areas. There are only data about French areas, but this can be extended to other countries. +Our approach Predihood aims at facilitating the comparison between neighbourhoods. It defines and predicts the environment of any neighbourhood in France using supervised learning. + +## Describing neighbourhoods + +In order to describe in the most accurate way the environment of a neighbourhood, social science researchers have defined six environment variables with a limited number of values for each one. These six variables are the _building type_, the _building usage_, the _landscape_, the _social class_, the _morphological position_ and the _geographical position_. As an example, the _landscape_ can be evaluated as _urban_, _green areas_, _forest_ or _countryside_ while the _social class_ have values from _lower_ to _upper_. These variables are commonly accepted and easily understandable. + +## Predicting neighbourhoods + +There are mainly four steps: producing supervised neighbourhoods, collecting data about neighbourhoods, compute datasets and finally running algorithms to predict environment. + +The first step of producing supervised neighbourhoods is a manual task that has been done by social science researchers. This task consists of giving a value for each environment variable. This has been done by investigating Google Street View (building and streets pictures, parked cars, facilities and greens areas) and requires between one to two hours for a single neighbourhood. A total of 300 neighbourhoods have been annotated. + +The second step is about collecting data that represents neighbourhoods. There are mainly two types of data: -After gathering data, the next step is to assess some neighbourhoods because of the supervised learning approach. This manual assessment has been realized by social science researchers. This have been done by investigating Google Street View (building and streets pictures, parked cars, facilities and greens areas) and requires between one to two hours for a single neighbourhood. A total of 300 IRIS have been annotated, which will be used as training data. +- The geometry, stored as a GeoJSON object, which describe the shape of the neighbourhood. +- The indicators which quantify the environment. Each neighbourhood can be described by thousands of indicators, such as the number of restaurants, the average income or even the number of houses over 250 $m^2$. Even if it is not possible to manually exploit these indicators, they are useful in an automatic approach. -In order to unify the view between assessed neighbourhoods and their indicators, datasets have been constructed. They look like Figure 1 and are composed of the code INSEE of the neighbourhood (grey column), its indicators (yellow columns) that have been normalized by density of population (green column) and the assessment of social science researchers for the six environment variables (blue columns). Our approach Predihood aims at automatically filling question marks for neighbourhoods that are not yet assessed. +Predihood integrates such data for France by using [mongiris](https://gitlab.liris.cnrs.fr/fduchate/mongiris), an interface for querying French administrative areas. Predihood is instantiated only for French data, but this can be easily extended to other countries. + +The third step aims at computing datasets that will aggregate aforementioned data. A dataset looks like Figure 1 and is composed of the code INSEE of the neighbourhood (grey column), its indicators (yellow columns) that have been normalized by density of population (green column) and the assessment of social science researchers for the six environment variables (blue columns). As a reminder, our approach Predihood aims at automatically filling question marks for neighbourhoods that are not yet assessed.  -It is now possible to predict the environment of any neighbourhood in France using our unified dataset. Because neighbourhoods are represented by hundreds of indicators, a selection process selects subsets of relevant indicators. These subsets are called _lists_ and contain from 10 to 100 indicators. They are used in the Predihood interface to predict environment. -Predihood proposes a generic interface for tuning algorithms more easily. This interface is based on [Scikit-learn](https://scikit-learn.org/stable/) algorithms but can handle hand-made ones. To implement your own algorithm and test it on our dataset, follow these steps: +The last step predicts the environment of any neighbourhood in France. Because neighbourhoods are represented by hundreds of indicators, a selection process selects subsets of relevant indicators. These subsets are called _lists_ and contain from 10 to 100 indicators. They are used in the Predihood interface to predict environment. + +## Algorithmic interface + +Because the prediction of these variables is a complex task, we have to test several algorithms to compare results. In order to facilitate the tuning and the using of the algorithms, Predihood proposes a generic and easy-to-use interface for algorithms. This interface is based on [Scikit-learn](https://scikit-learn.org/stable/) algorithms but can handle hand-made ones. To implement your own algorithm and test it on our dataset, follow these steps: -1. Create a new class that represents your algorithm, e.g. `MyOwnClassifier` and inherits from `Classifier`. -2. Then, implement the core of your algorithm by coding `fit()` and `predict()` functions. The `fit` function aims at fitting your classifier on assessed neighbourhoods while the `predict` function aims at predicting environment variables for a given neighbourhood. -3. Next, add `get_params()` to be compatible with Scikit-learn framework. -5. Finally, do not forget to comment your classifier with the Numpy style if you want to tune it. +1. Create a new class that represents your algorithm, e.g. `MyOwnClassifier`, and inherits from `Classifier`. +2. Implement the core of your algorithm by coding `fit()` and `predict()` functions. The `fit` function aims at fitting your classifier on assessed neighbourhoods while the `predict` function aims at predicting environment variables for a given neighbourhood. +3. Add `get_params()` to be compatible with Scikit-learn framework. +5. Comment your classifier with the Numpy style in order to be able to tune it in the interface. -Below, there is a very simple example to illustrate the aforementioned steps. +Below is a very simple example to illustrate the aforementioned steps. ```python # file ./algorithms/MyOwnClassifier.py @@ -61,7 +78,7 @@ from predihood.classes.Classifier import Classifier class MyOwnClassifier(Classifier): - """Some text. + """Description of the classifier. Parameters ------------ a : float, default=0.01 @@ -87,18 +104,22 @@ class MyOwnClassifier(Classifier): After that, your algorithm is ready to be used in Predihood. -# Mentions of Predihood - -Our approach Predihood has been presented during the DATA conference [@barretpredicting]. +Figure 2 shows the generic interface of Predihood for tuning algorithms. The left panel allows to tune parameters and hyper parameters, such as training and test sizes. On the right, the table illustrates the accuracies obtained for each list (generated during the selection process) and each environment variable. You can export these results by clicking on the download icon. -This first screenshot shows the generic interface of Predihood for tuning algorithms. The left panel allows to tune parameters and hyper parameters, such as training and test sizes. On the right, the table illustrates the accuracies obtained for each lists (generated during the selection process) and each environment variable. You can export these results by clicking on the download icon. + - +## Cartographic interface -This screenshot exposes the cartographic interface of Predihood, used mostly by people who search for a new living place. By searching an area in the inputs on the left and then clicking on neighbourhoods, you will be able to choose an algorithm to predict environment variables of the chosen neighbourhood. For beginners, `Random Forest` classifier is recommended. For example, Alice is an IT commercial and has been recruited for a mission in Lyon for 6 months before going back to Paris. She compares easily many neighbourdhoods in the CBD (Central Business District) of Lyon and chooses the "Part-Dieu" neighbourhood. +Figure 3 exposes the cartographic interface of Predihood, used mostly by people who search for a new living place. By searching an area in the inputs on the left and then clicking on neighbourhoods, you will be able to choose an algorithm to predict environment variables of the chosen neighbourhood. For beginners, `Random Forest` classifier is recommended. For example, Alice is an IT commercial and has been recruited for a mission in Lyon for 6 months before going back to Paris. She compares easily many neighbourdhoods in the CBD (Central Business District) of Lyon and chooses the "Part-Dieu" neighbourhood.  +# Mentions of Predihood + +Our approach Predihood has been presented during the DATA conference [@barretpredicting]. +Results vary from 30% to 65% depending on the environment variable, but proposing new algorithms can help to improve these results. + + # Acknowledgements This work has been partially funded by LABEX IMU (ANR-10-LABX-0088) from Université de Lyon, in the context of the program "Investissements d'Avenir" (ANR-11-IDEX-0007) from the French Research Agency (ANR). diff --git a/predihood-indicators.png b/predihood-indicators.png index 58c84fff79341abc223099227488ceb31c435ac0..2334f33672fbe4f808f1f72208a86017cbd85133 100644 Binary files a/predihood-indicators.png and b/predihood-indicators.png differ diff --git a/predihood/classes/MethodPrediction.py b/predihood/classes/MethodPrediction.py index aa15ddc9bbe826a7916a5f66dc7af17146498063..2295909288cc2692fb07d24dc29b1a812b1a3110 100644 --- a/predihood/classes/MethodPrediction.py +++ b/predihood/classes/MethodPrediction.py @@ -54,7 +54,6 @@ class MethodPrediction(Method): iris_indicators_names.append(indicator) elif self.dataset.normalization == "population": for indicator in self.dataset.selected_indicators: - # if indicator == "P14_POP": continue # skip this indicator because of normalisation # TODO if indicator in iris_object["properties"]["raw_indicators"] and iris_population > 0: iris_indicators_values.append(float(iris_object["properties"]["raw_indicators"][indicator]) / iris_population) else: diff --git a/predihood/main.py b/predihood/main.py index b868c690972ccb2dc72cda2bf50e969edd4aec5f..7b24e0ed1b8c998732fa68c46d29bab029ceafc5 100644 --- a/predihood/main.py +++ b/predihood/main.py @@ -39,7 +39,6 @@ def index(page): predihood.config.PREFERRED_LANGUAGE = request.args["lang"] else: predihood.config.PREFERRED_LANGUAGE = "english" - print(predihood.config.PREFERRED_LANGUAGE) return render_template("index.html", language=predihood.config.PREFERRED_LANGUAGE) diff --git a/predihood/predict.py b/predihood/predict.py index e3a277066896e54f0ef9d789be478a8f87da0603..d0c2e278bb567294f81101430daf590fdb11bf96 100644 --- a/predihood/predict.py +++ b/predihood/predict.py @@ -129,18 +129,16 @@ def predict_one_iris(iris_code, data, clf, train_size, test_size, remove_outlier algorithm = MethodPrediction(name='', dataset=dataset, classifier=clf) algorithm.fit() algorithm.predict(iris_code) - print(predihood.config.PREFERRED_LANGUAGE) if predihood.config.PREFERRED_LANGUAGE == "french": predicted_value = list(ENVIRONMENT_VALUES[env].keys())[list(ENVIRONMENT_VALUES[env].values()).index(algorithm.prediction)] # get french translation of the predicted value else: predicted_value = algorithm.prediction predictions_lst.append(predicted_value) - print(predictions_lst) if predihood.config.PREFERRED_LANGUAGE == "french": predictions[ENVIRONMENT_VARIABLES_FR[env]] = get_most_frequent(predictions_lst) else: predictions[env] = get_most_frequent(predictions_lst) # get the most frequent value and the number of occurrences - print(predictions) # TODO: give an example of the dictionary + print(predictions) # {'building_type': {'most_frequent': 'Towers', 'count_frequent': 7}, 'building_usage': {'most_frequent': 'Housing', 'count_frequent': 4}, ... } return predictions diff --git a/predihood/static/js/algorithms.js b/predihood/static/js/algorithms.js index 267e946941fec7413c2e90c98f0d68dd73340909..2dc9343daf89c89b17c4432e7c15b592bc47d614 100644 --- a/predihood/static/js/algorithms.js +++ b/predihood/static/js/algorithms.js @@ -5,8 +5,10 @@ let generalParameters = ["class_weight", "cv", "kernel", "max_iter", "memory", " let trainPercentage = $("#trainPercentage").val(); // to update test percentage depending on train percentage let testPercentage = $("#testPercentage").val(); // to update train percentage depending on test percentage let request_run = null; // the request send to the server (with classifier and its parameters) -const MAX_PARAMETERS = 5; +const MAX_PARAMETERS = 2; let preferred_language_algo = get_preferred_language(); + + // get parameters of the selected classifier and display them in the interface. $("#selectAlgorithm").change(function () { let algorithm_name = $(this).children("option:selected").val(); @@ -54,166 +56,156 @@ $("#testPercentage") }); // prevent user from clicking on the input // run the classifier with specified parameters and display results in the results section. -$("#runBtn") - .click("on", function () { - $("body").css("cursor", "progress"); - $(".wrapperTable input[type='checkbox']:not(:checked)").each(function () { - $(this).parent().parent().empty(); // remove tables that are not checked in the interface - }); - let userParameters = {}; - let chosen_clf = $("#formAlgorithm")[0].elements[0].value; - for (let key in $("#formParameters")[0].elements) { - let elem = $("#formParameters")[0].elements[key]; - if (parseInt(key) === undefined || isNaN(parseInt(key))) { - continue; - } // key is not an element of the form - // console.log(elem.title + " : " + elem.value + " / " + current_parameters[elem.title]["default"]); - if (elem.value !== current_parameters[elem.title]["default"] && elem.value !== "") { // get only parameters filled by user - let label_name = elem.title; - let val = elem.value; - if (elem.type === "text") { // input with text type - if (elem.title.includes("int") && parseInt(elem.value)) { - val = parseInt(elem.value) - } else if (elem.title.includes("float") && parseFloat(elem.val)) { - val = parseFloat(elem.value) - } - userParameters[label_name] = val; - } else if (elem.type === "number") { // input with number type - userParameters[label_name] = parseFloat(elem.value); - } else if (elem.type === "checkbox") { // input with checkbox type - userParameters[label_name] = elem.checked; +$("#runBtn").click("on", function () { + $("body").css("cursor", "progress"); + $(".wrapperTable input[type='checkbox']:not(:checked)").each(function () { + $(this).parent().parent().empty(); // remove tables that are not checked in the interface + }); + let userParameters = {}; + let chosen_clf = $("#formAlgorithm")[0].elements[0].value; + for (let key in $("#formParameters")[0].elements) { + let elem = $("#formParameters")[0].elements[key]; + if (parseInt(key) === undefined || isNaN(parseInt(key))) { + continue; + } // key is not an element of the form + // console.log(elem.title + " : " + elem.value + " / " + current_parameters[elem.title]["default"]); + if (elem.value !== current_parameters[elem.title]["default"] && elem.value !== "") { // get only parameters filled by user + let label_name = elem.title; + let val = elem.value; + if (elem.type === "text") { // input with text type + if (elem.title.includes("int") && parseInt(elem.value)) { + val = parseInt(elem.value) + } else if (elem.title.includes("float") && parseFloat(elem.val)) { + val = parseFloat(elem.value) } + userParameters[label_name] = val; + } else if (elem.type === "number") { // input with number type + userParameters[label_name] = parseFloat(elem.value); + } else if (elem.type === "checkbox") { // input with checkbox type + userParameters[label_name] = elem.checked; } } - userParameters["train_size"] = $("#trainPercentage")[0].valueAsNumber; - userParameters["test_size"] = $("#testPercentage")[0].valueAsNumber; - userParameters["remove_outliers"] = $("#removeOutliers").prop("checked"); - userParameters["remove_rural"] = $("#removeRural").prop("checked"); - console.log(chosen_clf); - console.log(userParameters); - request_run = $.ajax({ - "type": "GET", - "url": "/run", - //"async": false, - data: { - "clf": chosen_clf, - "parameters": JSON.stringify(userParameters) - }, - success: function (result) { - // each result is displayed with : - // - a checkbox to keep the results available in the next run - // - the results table, highlighted cells are best means for each EV - // - the list of parameters associated to the results - // - the mean for this classifier (all EV combined) - let keep = $("<label class='h5'><input type='checkbox' style='margin-right: 1rem;'/>" + chosen_clf + "</label>"); - let table = $("<table id='tableToExport'>").addClass("table table-hover table-responsive").append($("<tbody>")); - let results = result["results"] + } + userParameters["train_size"] = $("#trainPercentage")[0].valueAsNumber; + userParameters["test_size"] = $("#testPercentage")[0].valueAsNumber; + userParameters["remove_outliers"] = $("#removeOutliers").prop("checked"); + userParameters["remove_rural"] = $("#removeRural").prop("checked"); + console.log(chosen_clf); + console.log(userParameters); + request_run = $.ajax({ + "type": "GET", + "url": "/run", + "async": false, + data: { + "clf": chosen_clf, + "parameters": JSON.stringify(userParameters) + }, + success: function (result) { + // each result is displayed with : + // - a checkbox to keep the results available in the next run + // - the results table, highlighted cells are best means for each EV + // - the list of parameters associated to the results + // - the mean for this classifier (all EV combined) + let keep = $("<label class='h5'><input type='checkbox' style='margin-right: 1rem;'/>" + chosen_clf + "</label>"); + let table = $("<table id='tableToExport'>") + .addClass("table table-hover table-responsive") + .append($("<tbody>")); + let results = result["results"] - // header of table: None, 10, 20, ..., 100, Mean - let header = $("<tr>") - header.append("<th></th>"); - if(preferred_language_algo === "french") { - header.append("<th title='Précision obtenue avec tous les indicateurs'><i>I</i></th>") - for (let key of result["tops_k"]) { header.append("<th title='Précision obtenue avec la liste de "+key+" indicateurs'>" + key + "</th>") } // adding header with tops-k - header.append("<th title=\"Précision moyenne obtenue pour la variable d'environnement\">Moyenne</th>") - } else { - header.append("<th title='Accuracy obtained with all indicators'>I</th>") - for (let key of result["tops_k"]) { header.append("<th title='Accuracy obtained by list with "+key+" indicators'>" + key + "</th>") } // adding header with tops-k - header.append("<th title='Mean accuracy for the environment variable'>Mean</th>") + // header of table: None, 10, 20, ..., 100, Mean + let header = $("<tr>") + header.append("<th></th>"); + if(preferred_language_algo === "french") { + header.append("<th title='Précision obtenue avec tous les indicateurs'><i>I</i></th>") + for (let key of result["tops_k"]) { header.append("<th title='Précision obtenue avec la liste de "+key+" indicateurs'>" + key + "</th>") } // adding header with tops-k + header.append("<th title=\"Précision moyenne obtenue pour la variable d'environnement\">Moyenne</th>") + } else { + header.append("<th title='Accuracy obtained with all indicators'>I</th>") + for (let key of result["tops_k"]) { header.append("<th title='Accuracy obtained by list with "+key+" indicators'>" + key + "</th>") } // adding header with tops-k + header.append("<th title='Mean accuracy for the environment variable'>Mean</th>") + } + table.append(header); + console.log(results) + // content of table with computed accuracies + for (let key in results) { // iterating over env variables + let row = $("<tr>"); + let env = results[key]; + let max = getMaxValueDict(results[key]["accuracies"], env["accuracy_none"]); + row.append("<td>" + capitalizeFirstLetter(key.split("_").join(" ")) + "</td>") + + let col = $("<td>").text(env["accuracy_none"].toFixed(2) + "%") + if (env["accuracy_none"] === max) { + col.css("background-color", "#71dd8a") } - table.append(header); - console.log(results) - // content of table with computed accuracies - for (let key in results) { // iterating over env variables - let row = $("<tr>"); - console.log(key) - console.log(typeof(results[key])) - let env = capitalizeFirstLetter(results[key].split("_").join(" ")); - let max = getMaxValueDict(env["accuracies"], env["accuracy_none"]); - row.append("<td>" + key + "</td>") + row.append(col); - let col = $("<td>").text(env["accuracy_none"].toFixed(2) + "%") - if (env["accuracy_none"] === max) { - col.css("background-color", "#71dd8a") + for (let top_k in env["accuracies"]) { // iterating over top-k for each EV + let col = $("<td>").text(env["accuracies"][top_k].toFixed(2) + "%"); + if (env["accuracies"][top_k] === max) { + col.css("background-color", "#71dd8a"); } row.append(col); - - for (let top_k in env["accuracies"]) { // iterating over top-k for each EV - let col = $("<td>").text(env["accuracies"][top_k].toFixed(2) + "%"); - if (env["accuracies"][top_k] === max) { - col.css("background-color", "#71dd8a"); - } - row.append(col); - } - row.append("<td>" + env["mean"].toFixed(2) + "%</td>"); - table.append(row) } + row.append("<td>" + env["mean"].toFixed(2) + "%</td>"); + table.append(row) + } - // download icon - let download; - if(preferred_language_algo === "french") { - download = $("<i class='fas fa-download' style='margin-left: 1rem;' title='Exporter cette table comme un fichier Excel.'></i>") - } else { - download = $("<i class='fas fa-download' style='margin-left: 1rem;' title='Export this table as an Excel file.'></i>") - } - download.on("click", function (e) { - e.preventDefault(); - console.log($(this)) - console.log($(this)[0].nextElementSibling) - console.log($(this)[0].nextSibling) - console.log($("#tableToExport")) - $("#tableToExport").table2excel({ - type: 'xls', - filename: chosen_clf + '.xls', - preserveColors: true - }); + // download icon + let download; + if(preferred_language_algo === "french") { + download = $("<i class='fas fa-download' style='margin-left: 1rem;' title='Exporter cette table comme un fichier Excel.'></i>") + } else { + download = $("<i class='fas fa-download' style='margin-left: 1rem;' title='Export this table as an Excel file.'></i>") + } + download.on("click", function (e) { + e.preventDefault(); + $("#tableToExport").table2excel({ + type: 'xls', + filename: chosen_clf + '.xls', + preserveColors: true }); - let containing_table = $("<div>").prop("class", "wrapperTable"); - containing_table.append(keep).append(download).append(table); - - // list of parameters used to have the current results - let params = ""; - for (let elem in current_parameters) { - if (elem in userParameters) { - params += "<i>" + elem + "</i>: " + userParameters[elem] + " ; "; // adding user value - } else { - params += "<i>" + elem + "</i>: " + current_parameters[elem]["default"] + " ; "; - } // adding default value - } - containing_table.append(params); + }); + let containing_table = $("<div>").prop("class", "wrapperTable"); + containing_table.append(keep).append(download).append(table); - // Mean accuracy for this classifier - let mean_clf = 0; - for (let env in results) { - mean_clf += results[env]["mean"]; - } - mean_clf /= Object.keys(results).length; - if(preferred_language_algo === "french") { - containing_table.append("<br/> <b>Moyenne de cet algorithme : </b>" + mean_clf.toFixed(2) + "%"); + // list of parameters used to have the current results + let params = ""; + for (let elem in current_parameters) { + if (elem in userParameters) { + params += "<i>" + elem + "</i>: " + userParameters[elem] + " ; "; // adding user value } else { - containing_table.append("<br/> <b>Mean for this classifier: </b>" + mean_clf.toFixed(2) + "%"); - } - + params += "<i>" + elem + "</i>: " + current_parameters[elem]["default"] + " ; "; + } // adding default value + } + containing_table.append(params); - // append all to HTML - $("#resultsDiv").append(containing_table); - $("body").css("cursor", "default"); - }, - error: function (result, textStatus, errorThrown) { - console.log(errorThrown); - alert("something went wrong while training. Please check your parameters<br>" + textStatus); - $("body").css("cursor", "default"); + // Mean accuracy for this classifier + let mean_clf = 0; + for (let env in results) { + mean_clf += results[env]["mean"]; + } + mean_clf /= Object.keys(results).length; + if(preferred_language_algo === "french") { + containing_table.append("<br/> <b>Moyenne de cet algorithme : </b>" + mean_clf.toFixed(2) + "%"); + } else { + containing_table.append("<br/> <b>Mean for this classifier: </b>" + mean_clf.toFixed(2) + "%"); } - }); - return false; // don't reload + // append all to HTML + $("#resultsDiv").append(containing_table); + $("body").css("cursor", "default"); + }, + error: function (result, textStatus, errorThrown) { + console.log(errorThrown); + alert("something went wrong while training. Please check your parameters<br>" + textStatus); + $("body").css("cursor", "default"); + } }); -$("#abortRun").on("click", function () { - request_run.abort(); // TODO: abort also on Flask - alert("Request have been aborted"); + return false; // do not reload }); +// empty results when clicking on the trash icon $("#clearResults").on("click", function () { // empty div and add title + "clear all" button $("#resultsDiv") @@ -223,11 +215,11 @@ $("#clearResults").on("click", function () { /** * Adds the parameter in the interface, with label and input. - * @param {string} label The name of the parameter. - * @param {string} content The value of the parameter (default value). - * @param {string} type The type of the input (i.e. str, int, float, bool or None). - * @param {string} description The description of the parameter. It corresponds to the first sentence in the doc (sklearn) for the parameter. - * @param {boolean} hidden A boolean that indicates if the field is hidden or not (because we display only 5 parameters by default). + * @param {string} label The name of the parameter. + * @param {string} content The value of the parameter (default value). + * @param {string} type The type of the input (i.e. str, int, float, bool or None). + * @param {string} description The description of the parameter. It corresponds to the first sentence in the doc (sklearn) for the parameter. + * @param {boolean} hidden A boolean that indicates if the field is hidden or not (because we display only 5 parameters by default). */ function addElement(label, content, type, description, hidden) { // adds something like : diff --git a/predihood/static/js/carto.js b/predihood/static/js/carto.js index 281def39e09d18fe1902eec2973d871b74239d27..c14e74330df757d3387c3f9c7f51ce0fe51016da 100644 --- a/predihood/static/js/carto.js +++ b/predihood/static/js/carto.js @@ -40,9 +40,9 @@ function initialize() { } -/* -** Event for zoom changes : updates a label and if zoom enabled and above min zoom level, display iris -*/ +/** + * Event for zoom changes : updates a label and if zoom enabled and above min zoom level, display iris + */ function zoomendEvent() { zoomLevel = map.getZoom(); document.getElementById("spanZoomLevel").innerHTML = zoomLevel; @@ -56,25 +56,15 @@ function zoomendEvent() { } } -/* -** Method for deleting a layer (e.g., all iris in irisLayer) -*/ +/** + * Method for deleting a layer (e.g., all iris in irisLayer) + */ function removeLayer() { map.removeLayer(irisLayer); irisLayer = null; $("#zoneMessages").html(""); } -/** - * Reset the style of all highlighted layer elements - */ -function resetHighlightAll() { - if(irisLayer != null) { - $.each(irisLayer["layers"], function(key, value) { - irisLayer["layers"].resetStyle(key); - }); - } -} /** * Display popup with several information about the neighbourhood (descriptive information and environment variables) when clicking on it. @@ -221,7 +211,7 @@ function addLayerFromGeoJSON(geojson, events, style, typeMethod){ */ function eventsIRIS(feature, layer) { layer.on({ - mouseover: highlightFeature, + // mouseover: highlightFeature, //mouseout: resetHighlight, click: displayPopup //showPredictions }); diff --git a/predihood/static/js/prediction.js b/predihood/static/js/prediction.js index 42e3c5745e871b48f331943aa91c3378768eb6b3..882ea7468d2f14f37c9a989f4410e65c1b4a9c65 100644 --- a/predihood/static/js/prediction.js +++ b/predihood/static/js/prediction.js @@ -1,3 +1,9 @@ +/** + * Send an AJAX request to get predictions for the given IRIS. + * @param iris_code a string containing the code of the IRIS to predict. + * @param algorithm_name a strting containing the name of the algorithm used to predict environment. + * @returns {} a dictionary containing results of predictions, i.e. a value and a score for each EV. + */ function predict(iris_code, algorithm_name) { $("body").css("cursor", "progress"); let predictions = null; diff --git a/predihood/static/js/utils.js b/predihood/static/js/utils.js index 6df8aa1c088df224555cb6c92f5a6fc7862ab8dc..349f75ffddf46f4edf977d0a21b8a6f2dfa629ed 100644 --- a/predihood/static/js/utils.js +++ b/predihood/static/js/utils.js @@ -173,6 +173,10 @@ function parseFloatComplex(str) { } } +/** + * Get a list containing the environment variables. + * @returns {[]} a list containing the environment variables' names + */ function getEnvironmentVariables() { let env_var = null; $.ajax({ @@ -190,17 +194,10 @@ function getEnvironmentVariables() { return env_var; } - -function highlightFeature(e) { - var layer = e.target; - layer.setStyle({ - weight: 1, - color: '#666', - fillOpacity: 0.25 - }); - -} - +/** + * Get the preferred language that have been selected by the user. + * @return {string} a string containing the name of the preferred language. + */ function get_preferred_language() { let chosen_language = undefined; diff --git a/predihood/templates/algorithmic-interface.html b/predihood/templates/algorithmic-interface.html index 381eceee3517664d5029aa3eab0e8af694deae19..077259f44e440525bf66071ce1814c65baa2e8b7 100644 --- a/predihood/templates/algorithmic-interface.html +++ b/predihood/templates/algorithmic-interface.html @@ -10,7 +10,7 @@ {% else %} <p class="text-gray font-weight-bold text-uppercase px-3 small pb-4 mb-0" title="Choose an algorithm among the list below.">Select an algorithm</p> {% endif %} - <form id="formAlgorithm" class="px-3 small pb-4 mb-0"> + <form id="formAlgorithm" class="px-3 small"> <select id="selectAlgorithm"> {% if language == "french" %} <option selected value="Algorithme"> -- choisir un algorithme --</option> @@ -24,9 +24,9 @@ <hr/> {% if language == "french" %} - <p class="font-weight-bold text-uppercase px-3 small pb-4 mb-0" title="Paramétrer l'algorithme choisi.">Paramétrer l'algorithme</p> + <p class="font-weight-bold text-uppercase px-3 small" title="Paramétrer l'algorithme choisi.">Paramétrer l'algorithme</p> {% else %} - <p class="font-weight-bold text-uppercase px-3 small pb-4 mb-0" title="Tune the selected algorithm.">Tune algorithm</p> + <p class="font-weight-bold text-uppercase px-3 small" title="Tune the selected algorithm.">Tune algorithm</p> {% endif %} <form id="formParameters"> <div class="col-12" id="divParameters"> @@ -45,9 +45,9 @@ </form> {% if language == "french" %} - <p class="font-weight-bold text-uppercase px-3 small pb-4 mb-0" title="Paramétrer la répartition entre les jeux d'apprentissage et de test." style="padding-top: 1rem; padding-bottom: 0; margin-bottom: 0">Jeux de données</p> + <p class="font-weight-bold text-uppercase px-3 small" title="Paramétrer la répartition entre les jeux d'apprentissage et de test." style="padding-top: 1rem; padding-bottom: 0; margin-bottom: 0">Jeux de données</p> {% else %} - <p class="font-weight-bold text-uppercase px-3 small pb-4 mb-0" title="Tune the repartition of the data into train and test sets." style="padding-top: 1rem; padding-bottom: 0; margin-bottom: 0">Tune dataset</p> + <p class="font-weight-bold text-uppercase px-3 small" title="Tune the repartition of the data into train and test sets." style="padding-top: 1rem; padding-bottom: 0; margin-bottom: 0">Tune dataset</p> {% endif %} <ul class="nav flex-column bg-white mb-0"> <li class="nav-item"> @@ -108,7 +108,6 @@ Train, test and evaluate </button> {% endif %} - <!--<button id="abortRun" class="btn btn-danger" title="Arbort the current request">Abort</button>--> </div> {% include 'footer.html' %} </aside> diff --git a/predihood/templates/header.html b/predihood/templates/header.html index bccf4104bcd7444e1ff380cc8a18c914a15abe51..fa7f68d9cc03cbad08efbb8b18ef1d24b7f78cb8 100644 --- a/predihood/templates/header.html +++ b/predihood/templates/header.html @@ -1,9 +1,4 @@ <header class="mb-3"> <h1><a href="/"><img src="{{url_for('static', filename='img/favicon.png')}}"></a> predihood</h1> - {% if language == "french" %} - <em>Un outil de visualisation des IRIS</em> - {% else %} - <em>A tool for visualizing IRIS</em> - {% endif %} <hr> </header> \ No newline at end of file diff --git a/predihood/tests_utility_functions.py b/predihood/tests.py similarity index 91% rename from predihood/tests_utility_functions.py rename to predihood/tests.py index e7a60c59cdaf05f12fd31c282f321e8083b18157..97fb69451af178d245f8b8078decbbf8b4d59b55 100644 --- a/predihood/tests_utility_functions.py +++ b/predihood/tests.py @@ -3,9 +3,11 @@ # ============================================================================= # Unit tests for predihood. # ============================================================================= - +import os +import pandas as pd import unittest +from predihood.config import FOLDER_DATASETS, ENVIRONMENT_VALUES from predihood.utility_functions import check_train_test_percentages, intersection, union, similarity, \ get_most_frequent, address_to_code, address_to_city, indicator_full_to_short_label, \ indicator_short_to_full_label, get_classifier, set_classifier, signature, add_assessment_to_file @@ -47,9 +49,6 @@ class TestCase(unittest.TestCase): full_label = indicator_short_to_full_label(short_label) assert full_label == "Pop 11-17 ans en 2014 (princ)" - def test_hierarchy(self): - assert True == True # TODO - def test_get_classifier(self): # test if selecting a classifier gives the correct object classifier_name = "KNeighbors Classifier" @@ -140,6 +139,14 @@ class TestCase(unittest.TestCase): result = add_assessment_to_file(code_iris, values) assert result == "iris already assessed" + def test_values_dataset(self): + # test if values used in dataset are the same than the one declared by social science researchers + filename = os.path.join(FOLDER_DATASETS, "data_density.csv") + dataset = pd.read_csv(filename) + values_for_building_type = set([value for key, value in ENVIRONMENT_VALUES["building_type"].items()]) + + assert set(dataset["building_type"].tolist()) == values_for_building_type + if __name__ == "__main__": unittest.main(verbosity=2) # run all tests with verbose mode diff --git a/predihood/tests_selection.py b/predihood/tests_selection.py deleted file mode 100644 index 1e5640fd2331512da69620dce814923640fe3087..0000000000000000000000000000000000000000 --- a/predihood/tests_selection.py +++ /dev/null @@ -1,30 +0,0 @@ -#!/usr/bin/env python -# encoding: utf-8 -# ============================================================================= -# Unit tests for predihood. -# ============================================================================= -import os - -import pandas as pd -import unittest - -from predihood.config import FOLDER_DATASETS, ENVIRONMENT_VALUES - - -class TestCase(unittest.TestCase): - """ - A class for Predihood unit tests. - """ - - def test_values_dataset(self): - # test if values used in dataset are the same than the one declared by social science researchers - filename = os.path.join(FOLDER_DATASETS, "data_density.csv") - dataset = pd.read_csv(filename) - values_for_building_type = set([value for key, value in ENVIRONMENT_VALUES["building_type"].items()]) - - assert set(dataset["building_type"].tolist()) == values_for_building_type - - -if __name__ == "__main__": - unittest.main(verbosity=2) # run all tests with verbose mode - diff --git a/predihood/utility_functions.py b/predihood/utility_functions.py index 8ef5ceba964423c90d5411286b824520d3627994..1eb5b1ab82ef091440ba2349b1b586eda3493034 100644 --- a/predihood/utility_functions.py +++ b/predihood/utility_functions.py @@ -221,7 +221,7 @@ def signature(chosen_algorithm): try: # model = eval(_chosen_algorithm) # never use eval on untrusted strings model = get_classifier(chosen_algorithm) - doc = model.__doc__ # TODO: specify case when there is no doc (user-implemented algorithm) + doc = model.__doc__ param_section = "Parameters" dashes = "-" * len(param_section) # ------- number_spaces = doc.find(dashes) - (doc.find(param_section) + len(param_section)) diff --git a/readme2.md b/readme2.md deleted file mode 100644 index 105d1ef99ea648971247708144012431e5f3b57a..0000000000000000000000000000000000000000 --- a/readme2.md +++ /dev/null @@ -1,61 +0,0 @@ -# Predihood - -Predihood is an application for visualizing [IRIS](https://www.insee.fr/fr/metadonnees/definition/c1523) (administrative areas defined by the French institute of statistics, they can be considered as neighbourhoods) and indicators which describe them (e.g. number of bakeries, average income and even the number of houses over 250m^2). - -## Statement of need - -Predihood proposes an interface for searching and comparing neighbourhoods. - -## Installation instructions - -### Requirements -- Python, version >=3 -- [MongoDB](https://www.mongodb.com/), version >=4 for importing the database about neighbourhoods. - -### Installation - -For installing Predihood, type in a terminal: - -``` -python3 -m pip install -e predihood/ --process-dependency-links -``` - -This command install dependencies, including [mongiris](https://gitlab.liris.cnrs.fr/fduchate/mongiris) which provide the querying of the MongoDB database containing information about neighbourhoods. - -Create this database is mandatory. To achieve this, execute this command (from the MongoDB's executables directory if needed): - -``` -./mongorestore --archive=/path/to/dump-iris.bin -``` - -where `/path/to/` is the path to the dump file of the IRIS collection (provided with the package mongiris in `mongiris/data/dump-iris.bin`). - -### Run the interface - -For running *Predihood*, type in a terminal: - -``` -python3 main.py -``` - -After some information, the terminal display the URL for testing *Predihood* : `http://localhost:8080/`. If you want to try the cartographic interface, click on the button "Search a neighbourhood". Otherwise, if you want to configure and test your algorithm in our interface, click on the button "Tune my classifier". - -## Example usage - -For the cartographic interface, an example would be: - -1. Type a query in the panel on the left, e.g. "Lyon". This will display all neighbourhoods that contain "Lyon" in their name or their township. -2. Click on a neighbourhood (which are the small areas in blue). A tooltip will appear with some information about the neighbourhood. There are more informations when clicking on the "More details" link. -3. In order to predict the environment variables, you have to choose the classifier. The "Random Forest" classifier is recommended by default. After some seconds, predictions will appear in the tooltip. This will help you for comparing neighbourhoods between them.s - -For the algorithmic interface, an example would be: - -1. Choose an algorithm - -## Community guideline - - -## Functionality - - -## Tests \ No newline at end of file