solrbulk

package module
v0.2.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 1, 2016 License: GPL-3.0 Imports: 7 Imported by: 0

README

solrbulk

Motivation:

Sometimes you need to index a bunch of documents really, really fast. Even with Solr 4.0 and soft commits, if you send one document at a time you will be limited by the network. The solution is two-fold: batching and multi-threading. http://lucidworks.com/blog/high-throughput-indexing-in-solr/

solrbulk expects as input a file with line-delimited JSON. Each line represents a single document. solrbulk takes care of reformatting the documents into the bulk JSON format, that SOLR understands.

solrbulk will send documents in batches and in parallel. The number of documents per batch can be set via -size, the number of workers with -w.

Installation

$ go get github.com/miku/solrbulk/cmd/...

There are also DEB and RPM packages available: https://github.com/miku/solrbulk/releases/

Usage

$ solrbulk
Usage of solrbulk:
  -collection string
      SOLR core / collection
  -commit int
      commit after this many docs (default 1000000)
  -cpuprofile string
      write cpu profile to file
  -host string
      SOLR host (default "localhost")
  -memprofile string
      write heap profile to file
  -port int
      SOLR port (default 8983)
  -reset
      remove all docs from index
  -server string
      url to SOLR server, including host, port and path to collection
  -size int
      bulk batch size (default 1000)
  -v  prints current program version
  -verbose
      output basic progress
  -w int
      number of workers to use (default 4)
  -z  unzip gz'd file on the fly

Example

$ cat file.ldj
{"id": "1", "state": "Alaska"}
{"id": "2", "state": "California"}
{"id": "3", "state": "Oregon"}
...

$ solrbulk -verbose -server 192.168.1.222:8085/collection1 file.ldj

Some performance observations

  • Having as many workers as core is generally a good idea. However the returns seem to diminish fast with more cores.
  • Disable autoCommit, autoSoftCommit and the transaction log in solrconfig.xml.
  • Use some high number for -commit. solrbulk will issue a final commit request at the end of the processing anyway.
  • For some use cases, the bulk indexing approach is about twice as fast as a standard request to /solr/update.
  • On machines with more cores, try to increase maxIndexingThreads.

Elasticsearch?

Try esbulk.

Documentation

Overview

Copyright 2015 by Leipzig University Library, http://ub.uni-leipzig.de
               by The Finc Authors, http://finc.info
               by Martin Czygan, <martin.czygan@uni-leipzig.de>

This file is part of some open source application.

Some open source application is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

Some open source application is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with Foobar. If not, see <http://www.gnu.org/licenses/>.

@license GPL-3.0+ <http://spdx.org/licenses/GPL-3.0+>

Index

Constants

View Source
const Version = "0.2.2"

Version.

Variables

This section is empty.

Functions

func BulkIndex

func BulkIndex(docs []string, options Options) error

BulkIndex takes a set of documents as strings and indexes them into SOLR.

func Worker

func Worker(id string, options Options, lines chan string, wg *sync.WaitGroup)

Worker will batch index documents from lines channel.

Types

type Options

type Options struct {
	Host       string
	Port       int
	Collection string
	BatchSize  int
	CommitSize int
	Verbose    bool
	Server     string
}

Options holds bulk indexing options.

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL