wikstraktor
A python tool to query the wiktionary and extract structured lexical data.
This experimentally identifies every structured info and merges info from different sources.
Dependencies
This project does depend on python packages.
-
pywikibot
allows to use the mediawiki API -
wikitextparser
can parse mediawiki pages and extract sections, templates and links -
importlib
: to import parser modules
Installation
(maybe to be replaced by an automation of some sort, using a virtual environment might be better, see server version)
pip install pywikibot
pip install wikitextparser
pip install gitpython
-
pip install sqlite3
Might be provided with python -
pip install importlib
Optional (for python 2.*, not tested) - run
./setup.py
(used to store wikstraktor version in wiktionary extracts)
Wikstraktor Server
If you want wikstraktor as a server, you need to install flask and flask-cors — to allow other domains to query —, and best practice is to do so in a virtual environment.
The following commands are extracted from the aforementionned documentation, it is probably more secure to click on the link and follow the modules documentation :
python3 -m venv wikstraktorenv #create wikstraktorenv environment
. wikstraktorenv/bin/activate #activate environment
pip install Flask #install Flask
pip install -U flask-cors #install Flask cors
Use
Wikstraktor
Python
from wikstraktor import Wikstraktor
f = Wikstraktor.get_instance('fr', 'en') #create a wikstraktor,
# first parameter is the language of the wiki
# second parameter is the language of the word sought for
f.fetch("blue") #fetch an article
str(f) #convert content to json
Bash
usage: wikstraktor.py [-h] [-l LANGUAGE] [-w WIKI_LANGUAGE] [-m MOT]
[-f DESTINATION_FILE] [-A] [-C] [-n]
Interroger un wiktionnaire
ex :
‣./wikstraktor.py -m blue
‣./wikstraktor.py -m blue -f blue.json -A -C
‣./wikstraktor.py -l en -w fr -m blue -f blue.json -n -A -C
‣./wikstraktor.py -l en -w fr+en -m particular -f particular.json
options:
-h, --help show this help message and exit
-l LANGUAGE, --language LANGUAGE
la ou les langue(s) du mot (séparées par des “+”)
-w WIKI_LANGUAGE, --wiki_language WIKI_LANGUAGE
la ou les langue(s) du wiki (séparées par des “+”)
-m MOT, --mot MOT le mot à chercher
-f DESTINATION_FILE, --destination_file DESTINATION_FILE
le fichier dans lequel stocker le résultat
-A, --force_ascii json avec que des caractères ascii
-C, --compact json sans indentation
-n, --no_id json sans id
Wikstraktor Server
The server runs by default on port 5000, you can change that in the wikstraktor_server_config.py
file.
./wikstraktor_server.py
Then there is a very simple API :
-
GET server_url/search/<word>
: Searches the word in the default wiktionary -
GET server_url/search/<wiktlang>/<wordlang>/<word>
: Searches the word In wordlang in the wiktlang wiktionary Both API calls return a json object.
Licence
TODO but will be open source