## First approach : Embedding using places Wikipedia pages
<divstyle="text-align:center">
<imgsrc="documentation/imgs/first_approach.png"/>
<p>Figure 1 : First approach general workflow</p>
</div>
In this first approach, the goal is to produce embedding for place name. In order to do this, we designed a neural network that takes :
***Input:** Text sequence (phrase)
***Output** Latitute, Longitude, and the place type
Input texts are selected using Wikidata to filter Wikipedia pages about geographic places. Then, the filtered pages are retrieved on the Wikipedia corpus file. For each pages, we got :
* Title
* Introduction text
* Coordinates of the place (laatitude-Longitude)
* Place type (using a mapping between Wikidata and DBpedia Place subclasses)
### Step 1: Parse Wikipedia data !
First, download the Wikipedia corpus in the wanted language, *e.g. enwiki-latest-pages-articles.xml.bz2*
Then, use the `gensim` parser (doc [here](https://radimrehurek.com/gensim/scripts/segment_wiki.html)). Use the following command :
From this point, we change our vantage point by focusing our model propositions by using heavily spatial/geographical data, in this context gazetteer. In this second approach, we propose to generate an embedding for places (not place's toponym) based on their topology.
In order to do that, we use Geonames data to build a topology graph. This graph is generated based on intersection found between place buffer intersection.
(image ici)
Then, using topology network, we use node-embedding techniques to generate an embedding for each vertex (places).