diff --git a/README.md b/README.md index 6ca7990cbed26814a7fea1c2d1fd0f5e29df5336..eb50d6ab455675bdfc4391d5076992c0f8e39a32 100644 --- a/README.md +++ b/README.md @@ -1,19 +1,19 @@ # mongiris package -This package is an interface for querying French administrative areas ([IRIS](https://www.insee.fr/fr/metadonnees/definition/c1523), similar to neighborhoods) stored as documents in MongoDB. +This Python package is an interface for querying French administrative areas ([IRIS](https://www.insee.fr/fr/metadonnees/definition/c1523), similar to neighborhoods) stored as documents in MongoDB. Each IRIS includes indicators (e.g., average income, types of housings, number of bakeries or schools) that are useful for social sciences studies, for house/neighborhood recommendation, etc. In this package, the ~50,000 IRIS and their 350-650 indicators have been integrated and stored in the [GeoJSON format](https://geojson.org/), and an API enables the manipulation of these data. -## Pré-requis +## Prerequisites - Python, version >=3 - [MongoDB](https://www.mongodb.com/), version >=4, in which it is necessary to import the IRIS database (see Installation). ## Installation -To install mongiris: +To install mongiris (and its dependencies): ``` python3 -m pip install git+https://fduchate@gitlab.liris.cnrs.fr/fduchate/mongiris.git#egg=mongiris @@ -30,7 +30,6 @@ mongorestore --archive=/path/to/dump-dbinsee.bin where `/path/to/` indicates the path to the downloaded dump database. <!--(provided with the source package mongiris in `mongiris/data/dump/dump-dbinsee.bin`).--> This restoration may take a few minutes as the geospatial indexes are rebuilt. - ## Usage In MongoDB, the database is named `dbinsee`. It contains three collections: @@ -41,10 +40,10 @@ In MongoDB, the database is named `dbinsee`. It contains three collections: To manipulate the database, simply connect to MongoDB by creating an object of the `Mongiris` class. Using this object, twenty methods are available for querying the data. -Below is a minimal example of connection and queries: +Below is a minimal example of connection and queries (from `tests/dummy.py` file): ``` -from mongiris.main import Mongiris +from mongiris.api import Mongiris db = Mongiris() @@ -53,12 +52,14 @@ counts = db.count_documents(db.collection_indic, {}) # get complete information about iris identified with code 593500203 iris = db.find_one_document(db.collection_iris, {"properties.CODE_IRIS": "593500203"}) +print(iris) # get iris which contains coordinates 3.685111, 46.514643 -iris = db.point_in_which_iris([3.685111, 46.514643]) +iris2 = db.point_in_which_iris([3.685111, 46.514643]) +print(iris2) ``` -More examples, including testing geospatial queries, are available in the `tests/mongiris_test.py` file. +More examples, including testing geospatial queries, are available in the `tests/api_tests.py` file. ## Contributors diff --git a/doc/index.html b/doc/index.html index ac31c6ba909be5dc2211cacf0e4fc298242f1b6d..4e487a3b01ef30c1fba3095073a9864727d9de9e 100644 --- a/doc/index.html +++ b/doc/index.html @@ -22,8 +22,8 @@ <section id="section-intro"> The package <code>mongiris</code> consists of two modules: <ul> + <li><a href="api.html">api</a>, for manipulating IRIS data.</li> <li><a href="integrator.html">integrator</a>, for integrating data sources. There should be no need to run this module since the MongoDB dumps are provided.</li> - <li><a href=""api.html>api</a>, for manipulating IRIS data.</li> </ul> </section> diff --git a/mongiris/tests/dummy.py b/mongiris/tests/dummy.py index 6e4a216b91f65dabae73b6156702c476ebee4f6b..686ddcc32398670a0b49d5fa1527ea0e76d597f0 100644 --- a/mongiris/tests/dummy.py +++ b/mongiris/tests/dummy.py @@ -1,3 +1,9 @@ +#!/usr/bin/env python +# encoding: utf-8 +# ============================================================================= +# Dummy test for mongiris. +# ============================================================================= + from mongiris.api import Mongiris import json diff --git a/paper.bib b/paper.bib index ce335322e4f34336b3a97cf38e208df335f70815..9f41b76d322aea54c62856dffde3bdee6a444792 100644 --- a/paper.bib +++ b/paper.bib @@ -39,10 +39,16 @@ keywords = "Home buyer, Real estate website, Housing search behavior, Case-based @misc{datafrance, title={DataFrance}, howpublished={https://datafrance.info/}, - note = {https://datafrance.info/}, year=2018 } +@misc{insee-iris, + title={{Definition of IRIS}}, + author={INSEE}, + howpublished={http://www.insee.fr/en/metadonnees/definition/c1523}, + year=2016 +} + @inproceedings{airbnb2017, title={{Comment les h{\^o}tes et clients d'Airbnb parlent-ils des lieux ? Une analyse exploratoire {\`a} partir du cas parisien}}, booktitle={EXCES-EXtraction de Connaissances {\`a} partir de donn{\'e}Es Spatialis{\'e}es}, diff --git a/paper.md b/paper.md index 577dbe1287b401183a449f13edfa174d8d47fc51..5b489dedc3972d569060770ef4dc87fb3abe8c8a 100644 --- a/paper.md +++ b/paper.md @@ -11,7 +11,7 @@ authors: orcid: 0000-0001-6803-917X affiliation: 1 - name: Franck Favetta - orcid: 0000-0000-0000-0000 + orcid: 0000-0003-2039-3481 affiliation: 1 affiliations: - name: LIRIS, UMR5205 Université Claude Bernard Lyon 1, Lyon, France @@ -27,11 +27,12 @@ For instance, social science researchers study the relationship between citizens National institutions (e.g., Open Data initiatives, INSEE in France) may produce data about neighborhoods, but they are usually spread in heterogenous files (databases, spreadsheets). Initiatives such as DataFrance [@datafrance] enable their visualization on a map, but their authors do not share collected data. Thus, researchers have to manually collect and integrate raw data from national institutions, a challenging issue refered to as `data integration` [@christen2012data]. Although some tools such as OpenRefine or Talend facilitates this integration, they require expert knowledge and programming skills. -For these reasons, we propose the package Mongiris, which includes integrated data about French neighborhhods (IRIS) and an API for manipulating this data. +The French administration provides data about IRIS [@insee-iris], a small division unit of the national territory for statistical purposes (mostly with the same number of residents, thus mainly small-sized in cities and wider in rural areas). +To ease the exploitation of IRIS, we propose the package Mongiris, which includes integrated data about these neighborhoods (IRIS) and an API for manipulating them. # Summary -The package is composed of two modules: integration and API. +The Python package is composed of two modules: integration and API. The `integration module` is responsible for extracting information from data sources. The module currently supports spreadsheets produced by [INSEE](https://www.insee.fr/). Since data evolve (e.g., statistics from INSEE are updated every few years), the integration module may be run. Note that new data may be stored in different database or collections so that the evolution can be studied. @@ -39,7 +40,7 @@ For most users, there is no need to use the integration module since the dump of The current dump contains roughly 37,000 IRIS with 375 indicators and 12,800 IRIS with 640 indicators. <!-- {362: 36530, 650: 11738, 627: 1057, 385: 79} --> -The `API module` includes common operations such as searching an IRIS (by IRIS code or according any field value), inserting, updating or deleting an IRIS. +The `API module` includes common operations such as searching for an IRIS (by IRIS code or according to any field value), inserting, updating or deleting an IRIS. It also provides geospatial operations useful in a research context: get IRIS given coordinates, get all adjacent or close IRIS from a given IRIS, find all IRIS in a given area, etc. The Mongiris package is currently used in Mapiris, a tool for visualizing and searching for IRIS. diff --git a/paper.pdf b/paper.pdf index 20f5b8813ab1de37c236958805e47a3ac3ed92f0..e842848952959a356ffff863df8aa0a5d7e08b2d 100644 Binary files a/paper.pdf and b/paper.pdf differ