wikstraktor
A python tool to query the wiktionary and extract structured lexical data.
This experimentally identifies every structured info and merges info from different sources.
Dependencies
This project does depend on python packages.
-
pywikibot
allows to use the mediawiki API -
wikitextparser
can parse mediawiki pages and extract sections, templates and links -
importlib
: to import parser modules -
sqlite3
For logs -
gitpython
for logs -
json
for json use re
Installation
(maybe to be replaced by an automation of some sort, using a virtual environment might be better, see server version)
Basic version
python3 -m venv wikstraktorenv #optional for basic version
. wikstraktorenv/bin/activate #activate environment (optional)
pip install -r requirements.txt
./setup.py
Wikstraktor Server
If you want wikstraktor as a server, you need to install flask and flask-cors — to allow other domains to query —, and best practice is to do so in a virtual environment.
The following commands are extracted from the aforementionned documentation, it is probably more secure to click on the link and follow the modules documentation :
python3 -m venv wikstraktorenv #create wikstraktorenv environment
. wikstraktorenv/bin/activate #activate environment
pip install -r server_requirements.txt
./setup.py
Use
Wikstraktor
Python
from wikstraktor import Wikstraktor
f = Wikstraktor.get_instance('fr', 'en') #create a wikstraktor,
# first parameter is the language of the wiki
# second parameter is the language of the word sought for
f.fetch("blue") #fetch an article
str(f) #convert content to json
Bash
usage: wikstraktor.py [-h] [-l LANGUAGE] [-w WIKI_LANGUAGE] [-m MOT]
[-f DESTINATION_FILE] [-A] [-C]
Interroger un wiktionnaire
ex :
‣./wikstraktor.py -m blue
‣./wikstraktor.py -m blue -f blue.json -A -C
‣./wikstraktor.py -l en -w fr -m blue -f blue.json -A -C
options:
-h, --help show this help message and exit
-l LANGUAGE, --language LANGUAGE
la langue du mot
-w WIKI_LANGUAGE, --wiki_language WIKI_LANGUAGE
la langue du wiki
-m MOT, --mot MOT le mot à chercher
-f DESTINATION_FILE, --destination_file DESTINATION_FILE
le fichier dans lequel stocker le résultat
-A, --force_ascii json avec que des caractères ascii
-C, --compact json sans indentation
Wikstraktor Server
The server runs by default on port 5000, you can change that in the wikstraktor_server_config.py
file.
./wikstraktor_server.py
Then there is a very simple API :
-
GET server_url/search/<word>
: Searches the word in the default wiktionary -
GET server_url/search/<wiktlang>/<wordlang>/<word>
: Searches the word In wordlang in the wiktlang wiktionary Both API calls return a json object.
Licence
TODO but will be open source