Commit b2c461eb authored by Duchateau Fabien's avatar Duchateau Fabien

comments from ff

parent f4e9c79b
# mongiris package
This package is an interface for querying French administrative areas ([IRIS](https://www.insee.fr/fr/metadonnees/definition/c1523), similar to neighborhoods) stored as documents in MongoDB.
This Python package is an interface for querying French administrative areas ([IRIS](https://www.insee.fr/fr/metadonnees/definition/c1523), similar to neighborhoods) stored as documents in MongoDB.
Each IRIS includes indicators (e.g., average income, types of housings, number of bakeries or schools) that are useful for social sciences studies, for house/neighborhood recommendation, etc.
In this package, the ~50,000 IRIS and their 350-650 indicators have been integrated and stored in the [GeoJSON format](https://geojson.org/), and an API enables the manipulation of these data.
## Pré-requis
## Prerequisites
- Python, version >=3
- [MongoDB](https://www.mongodb.com/), version >=4, in which it is necessary to import the IRIS database (see Installation).
## Installation
To install mongiris:
To install mongiris (and its dependencies):
```
python3 -m pip install git+https://fduchate@gitlab.liris.cnrs.fr/fduchate/mongiris.git#egg=mongiris
......@@ -30,7 +30,6 @@ mongorestore --archive=/path/to/dump-dbinsee.bin
where `/path/to/` indicates the path to the downloaded dump database. <!--(provided with the source package mongiris in `mongiris/data/dump/dump-dbinsee.bin`).-->
This restoration may take a few minutes as the geospatial indexes are rebuilt.
## Usage
In MongoDB, the database is named `dbinsee`. It contains three collections:
......@@ -41,10 +40,10 @@ In MongoDB, the database is named `dbinsee`. It contains three collections:
To manipulate the database, simply connect to MongoDB by creating an object of the `Mongiris` class.
Using this object, twenty methods are available for querying the data.
Below is a minimal example of connection and queries:
Below is a minimal example of connection and queries (from `tests/dummy.py` file):
```
from mongiris.main import Mongiris
from mongiris.api import Mongiris
db = Mongiris()
......@@ -53,12 +52,14 @@ counts = db.count_documents(db.collection_indic, {})
# get complete information about iris identified with code 593500203
iris = db.find_one_document(db.collection_iris, {"properties.CODE_IRIS": "593500203"})
print(iris)
# get iris which contains coordinates 3.685111, 46.514643
iris = db.point_in_which_iris([3.685111, 46.514643])
iris2 = db.point_in_which_iris([3.685111, 46.514643])
print(iris2)
```
More examples, including testing geospatial queries, are available in the `tests/mongiris_test.py` file.
More examples, including testing geospatial queries, are available in the `tests/api_tests.py` file.
## Contributors
......
......@@ -22,8 +22,8 @@
<section id="section-intro">
The package <code>mongiris</code> consists of two modules:
<ul>
<li><a href="api.html">api</a>, for manipulating IRIS data.</li>
<li><a href="integrator.html">integrator</a>, for integrating data sources. There should be no need to run this module since the MongoDB dumps are provided.</li>
<li><a href=""api.html>api</a>, for manipulating IRIS data.</li>
</ul>
</section>
......
#!/usr/bin/env python
# encoding: utf-8
# =============================================================================
# Dummy test for mongiris.
# =============================================================================
from mongiris.api import Mongiris
import json
......
......@@ -39,10 +39,16 @@ keywords = "Home buyer, Real estate website, Housing search behavior, Case-based
@misc{datafrance,
title={DataFrance},
howpublished={https://datafrance.info/},
note = {https://datafrance.info/},
year=2018
}
@misc{insee-iris,
title={{Definition of IRIS}},
author={INSEE},
howpublished={http://www.insee.fr/en/metadonnees/definition/c1523},
year=2016
}
@inproceedings{airbnb2017,
title={{Comment les h{\^o}tes et clients d'Airbnb parlent-ils des lieux ? Une analyse exploratoire {\`a} partir du cas parisien}},
booktitle={EXCES-EXtraction de Connaissances {\`a} partir de donn{\'e}Es Spatialis{\'e}es},
......
......@@ -11,7 +11,7 @@ authors:
orcid: 0000-0001-6803-917X
affiliation: 1
- name: Franck Favetta
orcid: 0000-0000-0000-0000
orcid: 0000-0003-2039-3481
affiliation: 1
affiliations:
- name: LIRIS, UMR5205 Université Claude Bernard Lyon 1, Lyon, France
......@@ -27,11 +27,12 @@ For instance, social science researchers study the relationship between citizens
National institutions (e.g., Open Data initiatives, INSEE in France) may produce data about neighborhoods, but they are usually spread in heterogenous files (databases, spreadsheets). Initiatives such as DataFrance [@datafrance] enable their visualization on a map, but their authors do not share collected data.
Thus, researchers have to manually collect and integrate raw data from national institutions, a challenging issue refered to as `data integration` [@christen2012data]. Although some tools such as OpenRefine or Talend facilitates this integration, they require expert knowledge and programming skills.
For these reasons, we propose the package Mongiris, which includes integrated data about French neighborhhods (IRIS) and an API for manipulating this data.
The French administration provides data about IRIS [@insee-iris], a small division unit of the national territory for statistical purposes (mostly with the same number of residents, thus mainly small-sized in cities and wider in rural areas).
To ease the exploitation of IRIS, we propose the package Mongiris, which includes integrated data about these neighborhoods (IRIS) and an API for manipulating them.
# Summary
The package is composed of two modules: integration and API.
The Python package is composed of two modules: integration and API.
The `integration module` is responsible for extracting information from data sources. The module currently supports spreadsheets produced by [INSEE](https://www.insee.fr/).
Since data evolve (e.g., statistics from INSEE are updated every few years), the integration module may be run. Note that new data may be stored in different database or collections so that the evolution can be studied.
......@@ -39,7 +40,7 @@ For most users, there is no need to use the integration module since the dump of
The current dump contains roughly 37,000 IRIS with 375 indicators and 12,800 IRIS with 640 indicators.
<!-- {362: 36530, 650: 11738, 627: 1057, 385: 79} -->
The `API module` includes common operations such as searching an IRIS (by IRIS code or according any field value), inserting, updating or deleting an IRIS.
The `API module` includes common operations such as searching for an IRIS (by IRIS code or according to any field value), inserting, updating or deleting an IRIS.
It also provides geospatial operations useful in a research context: get IRIS given coordinates, get all adjacent or close IRIS from a given IRIS, find all IRIS in a given area, etc.
The Mongiris package is currently used in Mapiris, a tool for visualizing and searching for IRIS.
......
No preview for this file type
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment