blevebench

package module

v0.0.0-...-023122a Latest Latest Go to latest Published: Mar 15, 2019 License: Apache-2.0 Imports: 10 Imported by: 3

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/blevesearch/bleve-bench

Links

Open Source Insights

README ¶

bleve-bench

A utility for benchmarking bleve performance under various configurations and workloads.

The bleve-bench utility works by reading wikipedia articles from a previously generated file with a single article per line. This minimizes time spent reading articles.

The tool performs a number of operations in a given level, and then prints out summary statistics about the performance in this level. For example, with batch size 100 and level size 1000, it will:

Load documents individually
Load batches of until end of level
Run term search for "water"
Run <qrepeat - 1> term searches for "water"
Print one CSV row for this level

The total execution time can be a useful metric, but this is not an attempt to load as many articles as possible as fast as possible. Rather this tool is useful for seeing how the performance changes over time as the number of documents indexed grows.

Output Format

elapsed,docs,avg_single_doc_ms,avg_batched_doc_ms,query_water_matches,first_query_water_ms,avg_repeated5_query_water_ms

Running

This will download the wikipedia dataset if you don't have it. Then it will build the linefile utility. Then it will run the linefile utility on the wikipedia dataset. NOTE: the download is large and may take a long time (this only happens the first time)

	make wikilinefile

Build

	go build

To build it with support for some optional C-based storage engines

	go build -tags 'leveldb'

Run the benchmark with all defaults:

	./bleve-bench

Usage

	Usage of ./bleve-bench:
	  -batch=100: batch size
	  -config="": configuration file to use
	  -count=100000: total number of documents to process
	  -cpuprofile="": write cpu profile to file
	  -level=1000: report level
	  -memprofile="": write memory profile every level
	  -qrepeat=5: query repeat
	  -source="tmp/enwiki.txt": wikipedia line file
	  -target="bench.bleve": target index filename

Examples

Load 100000 articles, with all the defaults.

	./bleve-bench

Load 3000 articles using the leveldb backend and dump a cpu-profile at the end.

	./bleve-bench -config configs/leveldb.json -count 3000 -cpuprofile=leveldb.profile

Load 3000 articles using the leveldb backend and dump a memory profile after every level.

	./bleve-bench -config configs/leveldb.json -count 3000 -memprofile=leveldb-mem.profile

Conclusions

What kind of conclusions can we draw from this utility? Here is a chart produced using this utility to load 100k wikipedia documents into bleve using the LevelDB backend.

This shows several important things:

Indexing in a batch is faster than indexing individually.
Indexing performance remained consistent as we added more and more documents.
Term query time increases nearly linearly with respect to the number of documents matching the term. The query behavior becomes more erratic and changes slope around 50k documents.
The first query seems to perform better than the average of running the query five times. This is surprising and warrants further exploration.

Here is the same test run with the BoltDB backend.

This too shows several important things:

Indexing individual documents is considerably more expensive than a batch.
But both perform somewhat consistently through the test.
Query performance appears both more consistent and faster with BoltDB compared to LevelDB.

Documentation ¶

Index ¶

func BuildArticleMapping() mapping.IndexMapping
type Article
type BenchConfig
- func LoadConfigFile(path string) *BenchConfig
type WikiReader
- func NewWikiReader(path string) (*WikiReader, error)
- func (w *WikiReader) Close() error
- func (w *WikiReader) Next() (*Article, error)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func BuildArticleMapping ¶

func BuildArticleMapping() mapping.IndexMapping

BuildArticleMapping returns a mapping for indexing wikipedia articles in a manner similar to that done by lucene nightly benchmarks

Types ¶

type Article ¶

type Article struct {
	Title string `json:"title"`
	Text  string `json:"text"`
}

type BenchConfig ¶

type BenchConfig struct {
	IndexType string                 `json:"index_type"`
	KVStore   string                 `json:"kvstore"`
	KVConfig  map[string]interface{} `json:"kvconfig"`
}

func LoadConfigFile ¶

func LoadConfigFile(path string) *BenchConfig

type WikiReader ¶

type WikiReader struct {
	// contains filtered or unexported fields
}

func NewWikiReader ¶

func NewWikiReader(path string) (*WikiReader, error)

func (*WikiReader) Close ¶

func (w *WikiReader) Close() error

func (*WikiReader) Next ¶

func (w *WikiReader) Next() (*Article, error)

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
cmd
bbaggregate
bbrunner
bleve-analyzer
bleve-bench
bleve-blast
bleve-query

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL