Change README

33651e79 · Jacques Fize · 5a46b9f9 · 33651e79
Commit 33651e79 authored 5 years ago by Jacques Fize
--- a/README.md
+++ b/README.md
@@ -26,24 +26,24 @@ For Anaconda users
    while read requirement; do conda install --yes $requirement; done < requirements.txt
 <hr>
 ## Prepare required data
 ### Geonames data
- * download the Geonames data use to train the network [here](download.geonames.org/export/dump/)
+ 1. Download the Geonames data use to train the network [here](download.geonames.org/export/dump/)
- * download the hierarchy data [here](http://download.geonames.org/export/dump/hierarchy.zip)
+ 2. download the hierarchy data [here](http://download.geonames.org/export/dump/hierarchy.zip)
- * unzip both file in the directory of your choice
+ 3. unzip both file in the directory of your choice
- * run the script `train_test_split_geonames.py <geoname_filename>`
+ 4. run the script `train_test_split_geonames.py <geoname_filename>`
 ### Cooccurence data
- * First, you must download the Wikipedia corpus from which you want to extract co-occurrences : [English Wikipedia Corpus](https://dumps.wikimedia.org/enwiki/20200201/enwiki-20200201-pages-articles.xml.bz2)
+ 5. First, you must download the Wikipedia corpus from which you want to extract co-occurrences : [English Wikipedia Corpus](https://dumps.wikimedia.org/enwiki/20200201/enwiki-20200201-pages-articles.xml.bz2)
- * Parse the corpus with Gensim script using the following command : `python3 -m gensim.scripts.segment_wiki -i -f <wikicorpus> -o <1stoutputname>.json.gz`
+ 6. Parse the corpus with Gensim script using the following command : `python3 -m gensim.scripts.segment_wiki -i -f <wikicorpus> -o <1stoutputname>.json.gz`
- * Build a page of interest file that contains a list of Wikipedia pages. The file must be a csv with the following column : title,latitude,longitude.<br> You can find [here](https://projet.liris.cnrs.fr/hextgeo/files/place_en_fr_page_clean.csv) a page of interest file that contains places that appears in both FR and EN wikipedia.
+ 7. Build a page of interest file that contains a list of Wikipedia pages. The file must be a csv with the following column : title,latitude,longitude.<br> You can find [here](https://projet.liris.cnrs.fr/hextgeo/files/place_en_fr_page_clean.csv) a page of interest file that contains places that appears in both FR and EN wikipedia.
- * Then using and index that contains pages of interest run the command : `python3 script/get_cooccurrence.py <page_of_interest_file> <2noutputname> -c <1stoutputname>.json.gz`
+ 8. Then using and index that contains pages of interest run the command : `python3 script/get_cooccurrence.py <page_of_interest_file> <2noutputname> -c <1stoutputname>.json.gz`
- * Finally, split the resulting dataset with the script `train_test_split_cooccurrence_data.py <2ndoutputname>`
+ 9. Finally, split the resulting dataset with the script `train_test_split_cooccurrence_data.py <2ndoutputname>`
 ### If you're in a hurry
 French Geonames, French Wikipedia cooccurence data, and their train/test splits datasets can be found here : [https://projet.liris.cnrs.fr/hextgeo/files/](https://projet.liris.cnrs.fr/hextgeo/files/)
@@ -52,11 +52,9 @@ French Geonames, French Wikipedia cooccurence data, and their train/test splits
 ## Train the network
-The script `combination_embeddings.py` is the one responsible of the neural network training
 To train the network with default parameter use the following command : 
-    python3 combination_embeddings.py -a -i <geoname data filename> <hierarchy geonames data filename>
+    python3 combination_embeddings.py -i <geoname data filename> <hierarchy geonames data filename>
 ### Available parameters
@@ -73,4 +71,4 @@ To train the network with default parameter use the following command :
 | -t,--tolerance-value  | K-value in the computation of the accuracy@k                                    |
 | -e,--epochs           | number of epochs                                                                |
 | -d,--dimension        | size of the ngram embeddings                                                    |
-| --admin_code_1        | (Optional) If you wish to train the network on a specificate region             |
+| --admin_code_1        | (Optional) If you wish to train the network on a specific region             |