search

package module
v0.0.0-...-fb6cc8f Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 20, 2023 License: Apache-2.0, MIT Imports: 3 Imported by: 0

README

Crane 🐦

My blog post: WebAssembly Search Tools for Static Sites


Crane is a technical demo is inspired by Stork and uses a near-identical configuration file setup. So it had to be named after a bird too.

I wrote it to help me understand how WebAssembly search tools work. Please use Stork instead.

Crane is two programs. The first program scans a group of documents and builds an efficient index. 1MB of text and metadata is turned into a 25KB index (14KB gzipped). The second program is a Wasm module that is sent to the browser along with a little bit of JavaScript glue code and the index. The result is an instant search engine that helps users find web pages as they type.

Visit the demo


Crane instant search in action


The full text search engine is powered in part with code from Artem Krylysov's blog post Let's build a Full-Text Search engine.

No effort has been made to shrink the Wasm binary. See Reducing the size of Wasm files.

Use it

Describe your document files and their metadata.

[input]
files = [
    {
        path = "docs/essays/essay01.txt",
        url = "essays/essay01.txt",
        title = "Introduction"
    },
    # etc.
]

[output]
filename = "dist/federalist.crane"

Pass the configuration file to the build script. You'll want a fresh index whenever your documents change but you only need to build the Wasm module once ever.

./build-index.sh federalist.toml
./build-search.sh

Host the files from /dist on your website (e.g. wasm_exec.js, crane.js, crane.wasm, federalist.crane). And away you go!

const crane = new Crane("crane.wasm", "federalist.crane");
await crane.load();

const results = crane.query('some keywords');
console.log(results);

See the demo inside /docs for a basic UI.


Build demo page

./gh-pages.sh

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Analyze

func Analyze(text string) []string

Analyze analyzes the text and returns a slice of tokens.

func Intersection

func Intersection(a []int, b []int) []int

Intersection returns the set Intersection between a and b. a and b have to be sorted in ascending order and contain no duplicates.

func LowercaseFilter

func LowercaseFilter(tokens []string) []string

LowercaseFilter returns a slice of tokens normalized to lower case.

func StemmerFilter

func StemmerFilter(tokens []string) []string

StemmerFilter returns a slice of stemmed tokens.

func StopwordFilter

func StopwordFilter(tokens []string) []string

StopwordFilter returns a slice of tokens with stop words removed.

func Tokenize

func Tokenize(text string) []string

Tokenize returns a slice of tokens for the given text.

Types

type Document

type Document struct {
	Title string
	URL   string
	Text  string
	ID    int
}

Document represents a text file

type Index

type Index map[string][]int

Index is an inverted Index. It maps tokens to document IDs.

func (Index) Add

func (index Index) Add(docs []Document)

Add adds documents to the index.

func (Index) Search

func (index Index) Search(text string) []int

Search queries the index for the given text.

type Result

type Result struct {
	Title string
	URL   string
	ID    int
}

Result is a search result item

type Store

type Store struct {
	Index   Index
	Results []Result
}

Store contains results and their index

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL