Negapedia refresh generator and development environment: this package and docker image is responsible of generating negapedia, a website on social data extracted from wikipedia.
This image take in input the nationalization and store the result of the operations in /data (in-container folder). All the operation of data fetching are totally automatized and the result is negapedia website in the form of a gzipped tarball of gzipped webpages. The operations flow is composed of thee phases:
preprocessing of data: CPU intensive, it requires a good internet connection and 16GB of RAM;
exporting to csv, CPU intensive.
(optional) calculating TFIDF, CPU and IO intensive.
construction of in-container database - IO intensive, requires 300GB of storage, best if SSD.
exporting and compressing the static website from quering the database and TFIDF data.
url: Output base URL, %s is the optional placeholder for subdomain, default http://%s.negapedia.org.
source: source of data (net or savepoint), default net.
keep: keep every savepoint after the execution (true or false), default false.
tfidf: calculate TFIDF, if false, try available precalculated measures (true or false), default false.
test: Run as test on a fraction of the articles before savepoint (true or false), default false.
Examples
docker run negapedia/negapedia refresh -lang en: basic usage, run the image on the english nationalization and store the result in the in-containter /data folder.
docker run -v /path/2/out/dir:/data negapedia/negapedia --rm refresh -lang en:
..1. run the image as before.
..2. mount as a volume the guest /data folder to the host folder /path/2/out/dir, the output folder, so that at the end of the operations /path/2/out/dir will contain the result. This folder can be changed to an arbitrary folder of your choice.
..3. remove the image right after the execution.
docker run -v /path/2/out/dir:/data --rm --init -d negapedia/negapedia refresh -lang en, you may want to use this commad :
..1. run the image as before.
..2. run an init process that will take care of killing eventual zombie processes - just in case.
..3. run the image in detatched mode.
For further explanations please refer to docker run reference
Useful commands
docker pull negapedia/negapedia Update the image to the last revision.
docker kill --signal=SIGQUIT $(docker ps -ql) Quit the last container and log trace dump.
docker kill --signal=SIGUSR1 $(docker ps -ql) Log the trace dump of the last container without quitting it.
docker logs -f $(docker ps -lq) Fetch the logs of the last container.
docker system prune -fa --volumes Remove all unused images and volume without asking for confirmation.