From 33651e79d5ec6c03199db5d1d8989dd1842ad00c Mon Sep 17 00:00:00 2001
From: Jacques Fize <jacques.fize@insa-lyon.fr>
Date: Fri, 14 Feb 2020 16:16:52 +0100
Subject: [PATCH] Change README

---
 README.md | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/README.md b/README.md
index 6bdd557..b43d729 100644
--- a/README.md
+++ b/README.md
@@ -26,24 +26,24 @@ For Anaconda users
     while read requirement; do conda install --yes $requirement; done < requirements.txt
 
 
-
 <hr>
 
 ## Prepare required data
 
 ### Geonames data
- * download the Geonames data use to train the network [here](download.geonames.org/export/dump/)
- * download the hierarchy data [here](http://download.geonames.org/export/dump/hierarchy.zip)
- * unzip both file in the directory of your choice
- * run the script `train_test_split_geonames.py <geoname_filename>`
+ 1. Download the Geonames data use to train the network [here](download.geonames.org/export/dump/)
+ 2. download the hierarchy data [here](http://download.geonames.org/export/dump/hierarchy.zip)
+ 3. unzip both file in the directory of your choice
+ 4. run the script `train_test_split_geonames.py <geoname_filename>`
 
 ### Cooccurence data
 
- * First, you must download the Wikipedia corpus from which you want to extract co-occurrences : [English Wikipedia Corpus](https://dumps.wikimedia.org/enwiki/20200201/enwiki-20200201-pages-articles.xml.bz2)
- * Parse the corpus with Gensim script using the following command : `python3 -m gensim.scripts.segment_wiki -i -f <wikicorpus> -o <1stoutputname>.json.gz`
- * Build a page of interest file that contains a list of Wikipedia pages. The file must be a csv with the following column : title,latitude,longitude.<br> You can find [here](https://projet.liris.cnrs.fr/hextgeo/files/place_en_fr_page_clean.csv) a page of interest file that contains places that appears in both FR and EN wikipedia.
- * Then using and index that contains pages of interest run the command : `python3 script/get_cooccurrence.py <page_of_interest_file> <2noutputname> -c <1stoutputname>.json.gz`
- * Finally, split the resulting dataset with the script `train_test_split_cooccurrence_data.py <2ndoutputname>`
+ 5. First, you must download the Wikipedia corpus from which you want to extract co-occurrences : [English Wikipedia Corpus](https://dumps.wikimedia.org/enwiki/20200201/enwiki-20200201-pages-articles.xml.bz2)
+ 6. Parse the corpus with Gensim script using the following command : `python3 -m gensim.scripts.segment_wiki -i -f <wikicorpus> -o <1stoutputname>.json.gz`
+ 7. Build a page of interest file that contains a list of Wikipedia pages. The file must be a csv with the following column : title,latitude,longitude.<br> You can find [here](https://projet.liris.cnrs.fr/hextgeo/files/place_en_fr_page_clean.csv) a page of interest file that contains places that appears in both FR and EN wikipedia.
+ 8. Then using and index that contains pages of interest run the command : `python3 script/get_cooccurrence.py <page_of_interest_file> <2noutputname> -c <1stoutputname>.json.gz`
+ 9. Finally, split the resulting dataset with the script `train_test_split_cooccurrence_data.py <2ndoutputname>`
+
 ### If you're in a hurry
 
 French Geonames, French Wikipedia cooccurence data, and their train/test splits datasets can be found here : [https://projet.liris.cnrs.fr/hextgeo/files/](https://projet.liris.cnrs.fr/hextgeo/files/)
@@ -52,11 +52,9 @@ French Geonames, French Wikipedia cooccurence data, and their train/test splits
 
 ## Train the network
 
-The script `combination_embeddings.py` is the one responsible of the neural network training
-
 To train the network with default parameter use the following command : 
 
-    python3 combination_embeddings.py -a -i <geoname data filename> <hierarchy geonames data filename>
+    python3 combination_embeddings.py -i <geoname data filename> <hierarchy geonames data filename>
 
 ### Available parameters
 
@@ -73,4 +71,4 @@ To train the network with default parameter use the following command :
 | -t,--tolerance-value  | K-value in the computation of the accuracy@k                                    |
 | -e,--epochs           | number of epochs                                                                |
 | -d,--dimension        | size of the ngram embeddings                                                    |
-| --admin_code_1        | (Optional) If you wish to train the network on a specificate region             |
+| --admin_code_1        | (Optional) If you wish to train the network on a specific region             |
-- 
GitLab