mdb2es

command module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 16, 2018 License: MIT Imports: 1 Imported by: 0

README

Backend for new archive site

Overview

Backend for new archive site including, ETLs from BB Metadata DB to Elasticsearch.

Commands

The archive-backend is meant to be executed as command line. Type archive-backend <command> -h to see how to use each command.

archive-backend server

Execute the backend api server for the new archive site.

archive-backend version

Print the version of archive-backend

Configuration

The default config file is config.toml in your current work directory.

See config.sample.toml for a sample config file.

Release and Deployment

Once development is done, all tests are green, we want to go live. All we have to do is simply execute misc/release.sh.

To add a pre-release tag, add the relevant environment variable. For example,

PRE_RELEASE=rc.1 misc/release.sh

MDB models

When MDB schema is changed we need to update the mdb package. Run this script:

misc/update_mdb_models.sh

(See the next section below for the instructions on installing Elasticsearch for Windows)

http://mrzard.github.io/blog/2015/03/25/elasticsearch-enable-mlockall-in-centos-7/

Plugins
  1. Hebrew plugin: https://github.com/synhershko/elasticsearch-analysis-hebrew
  2. Instead of standard analyzer for exact match (הריון to be same as היריון):
sudo bin/elasticsearch-plugin install analysis-phonetic

https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-phonetic.html

WIP - Does not works yet.

  1. ICU plugin to transliterate Russian (and others) to enable phonetic on them:
sudo bin/elasticsearch-plugin install analysis-icu
  1. Ukrainial analyzer (fails for standard - Not started)
Build index

There are two more dependencies required to build index:

  1. Open Office (soffice binary) - to convert all doc to docx.
  2. python-docx pyton library - to get text from docx
  • pip install python-docx

Elasticsearch installation for Windows

  1. Download and install the Java Virtual Machine for Windows from http://www.oracle.com/technetwork/java/javase/downloads/jre8-downloads-2133155.html

alt text

  1. Download and install the Elasticsearch 5.6.0 MSI from https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.6.0.msi

  2. Open CMD as administrator

    1. Go to Elasticsearch bin directory

      cd C:\Program Files\Elastic\Elasticsearch\bin
      
    2. To install analysis-phonetic type

      elasticsearch-plugin install analysis-phonetic
      
    3. To install the hebrew plugin type

      elasticsearch-plugin install https://bintray.com/synhershko/elasticsearch-analysis-hebrew/download_file?file_path=elasticsearch-analysis-hebrew-5.6.0.zip
      
    4. Answer 'y' to the security question

      Continue with installation? [y/N]y
      
    5. To install ICU plugin type

      elasticsearch-plugin install analysis-icu
      
  3. Download and install Python - version 2.7.x https://www.python.org/downloads/

  4. Install python-docx (to get text from docx):

    • in CMD go to python directory
    cd C:\Python27
    
    • and type
    python -m pip install python-docx
    
  5. Download and install LibreOffice (not OpenOffice!)

    https://www.libreoffice.org/donate/dl/win-x86_64/5.4.5/en-US/LibreOffice_5.4.5_Win_x64.msi

    Update 'soffice-bin' value with soffice.exe full path in config.toml, [elasticsearch] section: "C://Program Files//LibreOffice 5//program//soffice.exe"

  6. Copy to config.toml the required commented-out lines from config.sample.toml that are related to Windows.

  7. Updating assets:

    In order to make correct data indexing you should update the ES mapping configuration files (JSON files in /data/es/mappings):

    1. Exec. \es\mappings\make.py with python from the root path of the project. For example:
      C:\Users\[USER]\go\src\github.com\Bnei-Baruch\archive-backend>python C:\Users\[USER]\go\src\github.com\Bnei-Baruch\archive-backend\es\mappings\make.py
      
    2. From the root path of the project, type:
      go-bindata -debug data/...
      
    3. Edit bindata.go file (located in the root folder) and replace "package main" with "package bindata".
    4. Move the modified bindata.go file to /bindata folder (delete old bindata.go from /bindata if exist and make sure the bindata.go is not exist any more in the root folder).
    5. Repeat this steps any time make.py is changed and executed.

License

MIT

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis
mdb
This file takes Rambler parser to our project as Rambler is binary and cannot be imported.
This file takes Rambler parser to our project as Rambler is binary and cannot be imported.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL