Skip to content
Snippets Groups Projects
Commit 31b3a443 authored by Empiriker's avatar Empiriker
Browse files

Update Readme

parent 613c1068
No related branches found
No related tags found
No related merge requests found
# live-query-wiktextract # live-query-wiktextract
## Installation This project provides a light-weight wrapper to the [wiktextract](https://github.com/tatuylonen/wiktextract) project. Where _wiktextract_ aims to parse whole snapshots of the Wiktionary projects (dump files) into machine-readable JSON, this project allows to efficiently query single pages of different Wiktionary editions.
Wiktionary dump files (`<wiktlang>wiktionary-<date>-pages-articles.xml.bz2`) need to be downloaded manually. The FLASK app accepts GET request at the url
0. Download Wiktionary dumpfiles from https://dumps.wikimedia.org/ and place at ./dumps/ ```
localhost:5000/search/<wiktlang>/<wordlang>/<word>
```
where `<wiktlang>` specifies the language of the desired Wiktionary edition, `<wordlang>` the language of the word, and `<word>` the word itself to be queried. The route returns the extracted JSON object for the given query.
## Local installation
### Local python environment ### 1. Download dump files
1. Create a virtual environment Download the most recent Wiktionary dump files for each supported Wiktionary edition (See `supported_wiktlangs` in `src/config.py`) from `https://dumps.wikimedia.org/backup-index.html` and place them in the `dumps/` directory. The dump files should follow the pattern `<wiktlang>wiktionary-<date>-pages-articles-multistream.xml.bz2`.
If multiple timestamped dumpf files per edition are present in the `dumps/` directory, the most recent one will be selected automatically.
### 2. Create a virtual environment
Create and activate a virtual Python environment with an environment manager of your choice. For example:
``` ```
virtualenv live-query-wiktextract virtualenv live-query-wiktextract
source live-query-wiktextract/bin/activate source live-query-wiktextract/bin/activate
``` ```
2. Install requirements.txt ### 3. Install dependencies
``` ```
pip install -r requirements.txt pip install -r requirements.txt
``` ```
_Since wiktextract is not regularly published as a Python package, we fix version control to a specific commit. That commit was used and tested during development._ _Since wiktextract is not regularly published as a Python package, we fix version control to a specific commit. The commit indicated in requirements.txt was used and tested during development._
### 4. Load templates from dump files
### Using Docker Run the script `src/load_templates.py` to extract module and template pages from the dumpfile into an sqlite database that will be used by `wiktextract`.
``` ```
docker build -t live-query-wiktextract . python src/load_templates.py
``` ```
## Usage ### 5. Start flask app
### With local environment ```
flask --app src/app.py run
```
## Using Docker
Alternatively the app can also be containerized using Docker. You still have to provide the dump files in `dumps/`.
Then performs the two steps:
### 2. Build image
``` ```
python src/load_templates.py docker build -t live-query-wiktextract .
flask --app src/app.py run --debug
``` ```
### Using Docker ### 3. Run image
``` ```
docker run -p 5000:80 live-query-wiktextract docker run -p 5000:80 live-query-wiktextract
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment