Backend for new archive site
Overview
Backend for new archive site including, ETLs from BB Metadata DB to Elasticsearch.
Commands
The archive-backend is meant to be executed as command line.
Type archive-backend <command> -h
to see how to use each command.
archive-backend server
Execute the backend api server for the new archive site.
archive-backend version
Print the version of archive-backend
Configuration
The default config file is config.toml
in your current work directory.
See config.sample.toml
for a sample config file.
Release and Deployment
Once development is done, all tests are green, we want to go live.
All we have to do is simply execute misc/release.sh
.
To add a pre-release tag, add the relevant environment variable. For example,
PRE_RELEASE=rc.1 misc/release.sh
MDB models
When MDB schema is changed we need to update the mdb
package. Run this script:
misc/update_mdb_models.sh
(See the next section below for the instructions on installing Elasticsearch for Windows)
http://mrzard.github.io/blog/2015/03/25/elasticsearch-enable-mlockall-in-centos-7/
Plugins
- Hebrew plugin:
https://github.com/synhershko/elasticsearch-analysis-hebrew
- Instead of standard analyzer for exact match (הריון to be same as היריון):
sudo bin/elasticsearch-plugin install analysis-phonetic
https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-phonetic.html
WIP - Does not works yet.
- ICU plugin to transliterate Russian (and others) to enable phonetic on them:
sudo bin/elasticsearch-plugin install analysis-icu
- Ukrainial analyzer (fails for standard - Not started)
Build index
There are two more dependencies required to build index:
- Open Office (soffice binary) - to convert all doc to docx.
- python-docx pyton library - to get text from docx
Elasticsearch installation for Windows
- Download and install the Java Virtual Machine for Windows from
http://www.oracle.com/technetwork/java/javase/downloads/jre8-downloads-2133155.html
-
Download and install the Elasticsearch 5.6.0 MSI from
https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.6.0.msi
-
Open CMD as administrator
-
Go to Elasticsearch bin directory
cd C:\Program Files\Elastic\Elasticsearch\bin
-
To install analysis-phonetic type
elasticsearch-plugin install analysis-phonetic
-
To install the hebrew plugin type
elasticsearch-plugin install https://bintray.com/synhershko/elasticsearch-analysis-hebrew/download_file?file_path=elasticsearch-analysis-hebrew-5.6.0.zip
-
Answer 'y' to the security question
Continue with installation? [y/N]y
-
To install ICU plugin type
elasticsearch-plugin install analysis-icu
-
Download and install Python - version 2.7.x
https://www.python.org/downloads/
-
Install python-docx (to get text from docx):
- in CMD go to python directory
cd C:\Python27
python -m pip install python-docx
-
Download and install LibreOffice (not OpenOffice!)
https://www.libreoffice.org/donate/dl/win-x86_64/5.4.5/en-US/LibreOffice_5.4.5_Win_x64.msi
Update 'soffice-bin' value with soffice.exe full path in config.toml, [elasticsearch] section:
"C://Program Files//LibreOffice 5//program//soffice.exe"
-
Copy to config.toml the required commented-out lines from config.sample.toml that are related to Windows.
-
Updating assets:
In order to make correct data indexing you should update the ES mapping configuration files (JSON files in /data/es/mappings):
- Exec. \es\mappings\make.py with python from the root path of the project. For example:
C:\Users\[USER]\go\src\github.com\Bnei-Baruch\archive-backend>python C:\Users\[USER]\go\src\github.com\Bnei-Baruch\archive-backend\es\mappings\make.py
- From the root path of the project, type:
go-bindata -debug data/...
- Edit bindata.go file (located in the root folder) and replace "package main" with "package bindata".
- Move the modified bindata.go file to /bindata folder (delete old bindata.go from /bindata if exist and make sure the bindata.go is not exist any more in the root folder).
- Repeat this steps any time make.py is changed and executed.
License
MIT