Skip to content
Snippets Groups Projects
Commit 99cf3e26 authored by Fize Jacques's avatar Fize Jacques
Browse files

Debug

parent 77e03390
No related branches found
No related tags found
No related merge requests found
...@@ -96,8 +96,8 @@ The data preparation is divided into three steps. First, we retrieve required da ...@@ -96,8 +96,8 @@ The data preparation is divided into three steps. First, we retrieve required da
1. First, you must download the Wikipedia corpus from which you want to extract co-occurrences : [English Wikipedia Corpus](https://dumps.wikimedia.org/enwiki/20200201/enwiki-20200201-pages-articles.xml.bz2) 1. First, you must download the Wikipedia corpus from which you want to extract co-occurrences : [English Wikipedia Corpus](https://dumps.wikimedia.org/enwiki/20200201/enwiki-20200201-pages-articles.xml.bz2)
2. Parse the corpus with Gensim script using the following command : `python3 -m gensim.scripts.segment_wiki -i -f <wikicorpus> -o <1stoutputname>.json.gz` 2. Parse the corpus with Gensim script using the following command : `python3 -m gensim.scripts.segment_wiki -i -f <wikicorpus> -o <1stoutputname>.json.gz`
3. Build a page of interest file that contains a list of Wikipedia pages. Use the script `extract_pages_of_interest.py` for that. You can find [here](https://projet.liris.cnrs.fr/hextgeo/files/pages_of_interest/place_en_fr_page_clean.csv) a page of interest file that contains places that appears in both FR and EN wikipedia. 3. Build a page of interest file that contains a list of Wikipedia pages. Use the script `extract_pages_of_interest.py` for that. You can find [here](https://projet.liris.cnrs.fr/hextgeo/files/pages_of_interest/place_en_fr_page_clean.csv) a page of interest file that contains places that appears in FR or EN wikipedia.
4. Then using and index that contains pages of interest run the command : `python3 script/get_cooccurrence.py <page_of_interest_file> <2noutputname> -c <1stoutputname>.json.gz` 4. Then, using the page of interest file, run the command : `python3 script/get_cooccurrence.py <page_of_interest_file> <final_output_name> -c <1stoutputname>.json.gz`
### Generate dataset ### Generate dataset
...@@ -113,6 +113,7 @@ Use the following command to generate the datasets for training your model. ...@@ -113,6 +113,7 @@ Use the following command to generate the datasets for training your model.
| --adj-nside | Healpix resolution where places within are considered adjacent | | --adj-nside | Healpix resolution where places within are considered adjacent |
| --split-nside | Size of the zone where the train/test split are done | | --split-nside | Size of the zone where the train/test split are done |
| --split-method | [per_pair\|per_entity] Split each dataset based on places (place cannot exists in both train and test) or pairs(place can appears in train and test) | | --split-method | [per_pair\|per_entity] Split each dataset based on places (place cannot exists in both train and test) or pairs(place can appears in train and test) |
| --no-sampling | To avoid sampling in generated pairs |
### If you're in a hurry ### If you're in a hurry
...@@ -123,7 +124,7 @@ French (also GB,US) Geonames, French (also GB,US) Wikipedia co-occurrence data, ...@@ -123,7 +124,7 @@ French (also GB,US) Geonames, French (also GB,US) Wikipedia co-occurrence data,
To train the first model use the following command : To train the first model use the following command :
python3 train_geocoder_v2.py <dataset_name> <inclusion_dataset> <adjacent_dataset> <cooccurrence_dataset> [-i | -a | -w ]+ [optional args] python3 train_geocoder.py <dataset_name> <inclusion_dataset> <adjacent_dataset> <cooccurrence_dataset> [-i | -a | -w ]+ [optional args]
| Parameter | Description | | Parameter | Description |
|-----------------------|---------------------------------------------------------------------------------| |-----------------------|---------------------------------------------------------------------------------|
......
...@@ -185,15 +185,15 @@ class EpochTimer(Callback): ...@@ -185,15 +185,15 @@ class EpochTimer(Callback):
self.epoch = pd.read_csv(log_filename).Epoch.max() self.epoch = pd.read_csv(log_filename).Epoch.max()
else: else:
self.output = open(log_filename,'w') self.output = open(log_filename,'w')
self.output.write("{0},{1}\n".format("Epoch","Execution Time")) self.output.write("{0},{1},{2},{3},{4},{5}\n".format("Epoch","Execution Time","Loss","Val_Loss","Accuracy","Accuracy_val"))
self.output.flush() self.output.flush()
def on_epoch_begin(self,epoch, logs={}): def on_epoch_begin(self,epoch, logs={}):
self.timer = time.time() self.timer = time.time()
def on_epoch_end(self, epoch, logs=None): def on_epoch_end(self, epoch, logs={}):
end_time = time.time() - self.timer end_time = time.time() - self.timer
self.output.write("{0},{1}\n".format(self.epoch,end_time)) self.output.write("{0},{1},{2},{3},{4},{5}\n".format(self.epoch,end_time,logs["loss"],logs["val_loss"],logs["compute_metric"],logs["val_compute_metric"]))
self.output.flush() self.output.flush()
self.epoch += 1 self.epoch += 1
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment