wikstraktor
===========

A python tool to query the [wiktionary](https://wiktionary.org) and extract [structured lexical data](https://gitlab.liris.cnrs.fr/lex-game/wikstraktor/-/wikis/Entry-structure).

This experimentally identifies every structured info and merges info from different sources.

## Dependencies
This project does depend on python packages.
* [``pywikibot``](https://github.com/wikimedia/pywikibot) allows to use the mediawiki API
    * [documentation](https://doc.wikimedia.org/pywikibot/stable/api_ref/pywikibot.html)
    * [manual](https://www.mediawiki.org/wiki/Manual:Pywikibot)
    * [configuration for the wiktionary](https://github.com/wikimedia/pywikibot/blob/master/pywikibot/families/wiktionary_family.py)
* [``wikitextparser``](https://github.com/5j9/wikitextparser) can parse mediawiki pages and extract sections, templates and links
    * [documentation](https://wikitextparser.readthedocs.io/en/latest/#api-reference)
* [``importlib``](https://docs.python.org/3/library/importlib.html) : to import parser modules
* [``sqlite3``](https://docs.python.org/3/library/sqlite3.html) For logs
* [``gitpython``](https://gitpython.readthedocs.io/en/stable/) for logs
* [``json``](https://docs.python.org/3/library/json.html) for json use
* [``re``](https://docs.python.org/3/library/re.html)

## Installation
(maybe to be replaced by an automation of some sort, using a virtual environment might be better, see [server version](#wikstraktor-server))

### Basic version
```bash
python3 -m venv wikstraktorenv #optional for basic version
. wikstraktorenv/bin/activate #activate environment (optional)
pip install -r requirements.txt
./setup.py
```

### Wikstraktor Server
If you want wikstraktor as a server, you need to install [flask](https://flask.palletsprojects.com/en/2.0.x/installation/) and [flask-cors](https://flask-cors.readthedocs.io/en/latest/) — to allow other domains to query —, and best practice is to do so in a [virtual environment](https://docs.python.org/3/library/venv.html#module-venv).

The following commands are extracted from the aforementionned documentation, it is probably more secure to click on the link and follow the modules documentation :
```bash
python3 -m venv wikstraktorenv #create wikstraktorenv environment
. wikstraktorenv/bin/activate #activate environment
pip install -r server_requirements.txt
./setup.py
```

## Use
### Wikstraktor
#### Python
```python
from wikstraktor import Wikstraktor
f = Wikstraktor.get_instance('fr', 'en') #create a wikstraktor,
    # first parameter is the language of the wiki
    # second parameter is the language of the word sought for
f.fetch("blue") #fetch an article
str(f) #convert content to json
```

#### Bash
```
usage: wikstraktor.py [-h] [-l LANGUAGE] [-w WIKI_LANGUAGE] [-m MOT]
                      [-f DESTINATION_FILE] [-A] [-C]

Interroger un wiktionnaire
	ex :
	‣./wikstraktor.py -m blue
	‣./wikstraktor.py -m blue -f blue.json -A -C
	‣./wikstraktor.py -l en -w fr -m blue -f blue.json -A -C

options:
  -h, --help            show this help message and exit
  -l LANGUAGE, --language LANGUAGE
                        la langue du mot
  -w WIKI_LANGUAGE, --wiki_language WIKI_LANGUAGE
                        la langue du wiki
  -m MOT, --mot MOT     le mot à chercher
  -f DESTINATION_FILE, --destination_file DESTINATION_FILE
                        le fichier dans lequel stocker le résultat
  -A, --force_ascii     json avec que des caractères ascii
  -C, --compact         json sans indentation
```

### Wikstraktor Server
The server runs by default on port 5000, you can change that in the ```wikstraktor_server_config.py``` file.
```bash
./wikstraktor_server.py
```
Then there is a very simple API :
* ``GET server_url/search/<word>`` : Searches the word in the default wiktionary
* ``GET server_url/search/<wiktlang>/<wordlang>/<word>`` : Searches the word In wordlang in the wiktlang wiktionary
Both API calls return a json object.

## Licence
TODO but will be open source