mdb2es

command module
v1.13.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 5, 2021 License: MIT Imports: 1 Imported by: 0

README

Backend for new archive site

Overview

Backend for new archive site including, ETLs from BB Metadata DB to Elasticsearch.

Commands

The archive-backend is meant to be executed as command line. Type archive-backend <command> -h to see how to use each command.

archive-backend server

Execute the backend api server for the new archive site.

archive-backend version

Print the version of archive-backend

Configuration

The default config file is config.toml in your current work directory.

See config.sample.toml for a sample config file.

Release and Deployment

Once development is done, all tests are green, we want to go live. All we have to do is simply execute misc/release.sh.

To add a pre-release tag, add the relevant environment variable. For example,

PRE_RELEASE=rc.1 misc/release.sh

MDB models

When MDB schema is changed we need to update the mdb package. Run this script:

misc/update_mdb_models.sh

(See the next section below for the instructions on installing Elasticsearch for Windows)

http://mrzard.github.io/blog/2015/03/25/elasticsearch-enable-mlockall-in-centos-7/

Build index

There are two more dependencies required to build index:

  1. Open Office (soffice binary) - to convert all doc to docx.
  2. python-docx pyton library - to get text from docx
  • pip install python-docx

Elasticsearch installation for Windows

  1. Disable Go Modules.

    • Type from within the repository directory:
    go env -w  GO111MODULE=off
    
  2. Download and install the Java Virtual Machine for Windows from http://www.oracle.com/technetwork/java/javase/downloads/jre8-downloads-2133155.html

alt text

  1. Download Elasticsearch 6.8.2 from https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.8.2.zip

    Extract it to C:\elasticsearch-6.8.2

    If Elasticsearch keep crashing, consider adding these lines to jvm.options file in C:\elasticsearch-6.8.2\config

    • -Xms2g
    • -Xmx2g

    Xms represents the initial size of total heap space. Xmx represents the maximum size of total heap space.

  2. Install the Hebrew dictionaries:

    • Download: he_IL.aff, he_IL.dic, settings.yml files from https://github.com/elastic/hunspell/tree/master/dicts/he_IL
    • Put these files under C:\elasticsearch-6.8.2\config\hunspell\he_IL
    • Additional dictionary terms are managed in .delta.dic files and located in search/hunspell directory of the repository. Copy these files into corresponding folders inside elasticsearch-6.8.2/config/hunspell/
    • Note that on Linux environment the supported format for Hebrew dictionary files is ISO 8859-8.
  3. Download and install Python - version 2.7.x https://www.python.org/downloads/

  4. Install python-docx (to get text from docx):

    • in CMD go to python directory
    cd C:\Python27
    
    • and type
    python -m pip install python-docx
    
  5. Download and install LibreOffice (not OpenOffice!)

    https://www.libreoffice.org/donate/dl/win-x86_64/5.4.5/en-US/LibreOffice_5.4.5_Win_x64.msi

    Update 'soffice-bin' value with soffice.exe full path in config.toml, [elasticsearch] section: "C://Program Files//LibreOffice 5//program//soffice.exe"

  6. Copy to config.toml the required commented-out lines from config.sample.toml that are related to Windows.

  7. Updating assets:

    In order to make correct data indexing you should update the ES mapping configuration files (JSON files in /data/es/mappings):

    1. Exec. \es\mappings\make.py with python from the root path of the project. For example:
      C:\Users\[USER]\go\src\github.com\Bnei-Baruch\archive-backend>python C:\Users\[USER]\go\src\github.com\Bnei-Baruch\archive-backend\es\mappings\make.py
      
    2. From the root path of the project, type:
      go-bindata -debug data/...
      
    3. Edit bindata.go file (located in the root folder) and replace "package main" with "package bindata".
    4. Move the modified bindata.go file to /bindata folder (delete old bindata.go from /bindata if exist and make sure the bindata.go is not exist any more in the root folder).
    5. Repeat this steps any time make.py is changed and executed.

Install go-bindata

go get -u github.com/jteeuwen/go-bindata/...

License

MIT

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis
mdb
This file takes Rambler parser to our project as Rambler is binary and cannot be imported.
This file takes Rambler parser to our project as Rambler is binary and cannot be imported.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL