wikstraktor =========== A python tool to query the [wiktionary](https://wiktionary.org) and extract [structured lexical data](https://gitlab.liris.cnrs.fr/lex-game/wikstraktor/-/wikis/Entry-structure). This experimentally identifies every structured info and merges info from different sources. ## Dependencies This project does depend on python packages. * [``pywikibot``](https://github.com/wikimedia/pywikibot) allows to use the mediawiki API * [documentation](https://doc.wikimedia.org/pywikibot/stable/api_ref/pywikibot.html) * [manual](https://www.mediawiki.org/wiki/Manual:Pywikibot) * [configuration for the wiktionary](https://github.com/wikimedia/pywikibot/blob/master/pywikibot/families/wiktionary_family.py) * [``wikitextparser``](https://github.com/5j9/wikitextparser) can parse mediawiki pages and extract sections, templates and links * [documentation](https://wikitextparser.readthedocs.io/en/latest/#api-reference) * [``importlib``](https://docs.python.org/3/library/importlib.html) : to import parser modules * [``sqlite3``](https://docs.python.org/3/library/sqlite3.html) For logs * [``gitpython``](https://gitpython.readthedocs.io/en/stable/) for logs * [``json``](https://docs.python.org/3/library/json.html) for json use * [``re``](https://docs.python.org/3/library/re.html) ## Installation (maybe to be replaced by an automation of some sort, using a virtual environment might be better, see [server version](#wikstraktor-server)) ### Basic version ```bash python3 -m venv wikstraktorenv #optional for basic version . wikstraktorenv/bin/activate #activate environment (optional) pip install -r requirements.txt ./setup.py ``` ### Wikstraktor Server If you want wikstraktor as a server, you need to install [flask](https://flask.palletsprojects.com/en/2.0.x/installation/) and [flask-cors](https://flask-cors.readthedocs.io/en/latest/) — to allow other domains to query —, and best practice is to do so in a [virtual environment](https://docs.python.org/3/library/venv.html#module-venv). The following commands are extracted from the aforementionned documentation, it is probably more secure to click on the link and follow the modules documentation : ```bash python3 -m venv wikstraktorenv #create wikstraktorenv environment . wikstraktorenv/bin/activate #activate environment pip install -r server_requirements.txt ./setup.py ``` ## Use ### Wikstraktor #### Python ```python from wikstraktor import Wikstraktor f = Wikstraktor.get_instance('fr', 'en') #create a wikstraktor, # first parameter is the language of the wiki # second parameter is the language of the word sought for f.fetch("blue") #fetch an article str(f) #convert content to json ``` #### Bash ``` usage: wikstraktor.py [-h] [-l LANGUAGE] [-w WIKI_LANGUAGE] [-m MOT] [-f DESTINATION_FILE] [-A] [-C] Interroger un wiktionnaire ex : ‣./wikstraktor.py -m blue ‣./wikstraktor.py -m blue -f blue.json -A -C ‣./wikstraktor.py -l en -w fr -m blue -f blue.json -A -C options: -h, --help show this help message and exit -l LANGUAGE, --language LANGUAGE la langue du mot -w WIKI_LANGUAGE, --wiki_language WIKI_LANGUAGE la langue du wiki -m MOT, --mot MOT le mot à chercher -f DESTINATION_FILE, --destination_file DESTINATION_FILE le fichier dans lequel stocker le résultat -A, --force_ascii json avec que des caractères ascii -C, --compact json sans indentation ``` ### Wikstraktor Server The server runs by default on port 5000, you can change that in the ```wikstraktor_server_config.py``` file. ```bash ./wikstraktor_server.py ``` Then there is a very simple API : * ``GET server_url/search/<word>`` : Searches the word in the default wiktionary * ``GET server_url/search/<wiktlang>/<wordlang>/<word>`` : Searches the word In wordlang in the wiktlang wiktionary Both API calls return a json object. ## Licence TODO but will be open source