This project provides a light-weight wrapper to the [wiktextract](https://github.com/tatuylonen/wiktextract) project. Where _wiktextract_ aims to parse whole snapshots of the Wiktionary projects (dump files) into machine-readable JSON, this project allows to efficiently query single pages of different Wiktionary editions.
Wiktionary dump files (`<wiktlang>wiktionary-<date>-pages-articles.xml.bz2`) need to be downloaded manually.
The FLASK app accepts GET request at the url
0. Download Wiktionary dumpfiles from https://dumps.wikimedia.org/ and place at ./dumps/
where `<wiktlang>` specifies the language of the desired Wiktionary edition, `<wordlang>` the language of the word, and `<word>` the word itself to be queried. The route returns the extracted JSON object for the given query.
## Local installation
### Local python environment
### 1. Download dump files
1. Create a virtual environment
Download the most recent Wiktionary dump files for each supported Wiktionary edition (See `supported_wiktlangs` in `src/config.py`) from `https://dumps.wikimedia.org/backup-index.html` and place them in the `dumps/` directory. The dump files should follow the pattern `<wiktlang>wiktionary-<date>-pages-articles-multistream.xml.bz2`.
If multiple timestamped dumpf files per edition are present in the `dumps/` directory, the most recent one will be selected automatically.
### 2. Create a virtual environment
Create and activate a virtual Python environment with an environment manager of your choice. For example:
```
```
virtualenv live-query-wiktextract
virtualenv live-query-wiktextract
source live-query-wiktextract/bin/activate
source live-query-wiktextract/bin/activate
```
```
2. Install requirements.txt
### 3. Install dependencies
```
```
pip install -r requirements.txt
pip install -r requirements.txt
```
```
_Since wiktextract is not regularly published as a Python package, we fix version control to a specific commit. That commit was used and tested during development._
_Since wiktextract is not regularly published as a Python package, we fix version control to a specific commit. The commit indicated in requirements.txt was used and tested during development._
### 4. Load templates from dump files
### Using Docker
Run the script `src/load_templates.py` to extract module and template pages from the dumpfile into an sqlite database that will be used by `wiktextract`.
```
```
docker build -t live-query-wiktextract .
python src/load_templates.py
```
```
## Usage
### 5. Start flask app
### With local environment
```
flask --app src/app.py run
```
## Using Docker
Alternatively the app can also be containerized using Docker. You still have to provide the dump files in `dumps/`.