find

package
v0.0.9 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 24, 2020 License: Apache-2.0 Imports: 15 Imported by: 2

Documentation

Overview

Functions for finding collections by partial match on collection title

Functions for parsing a search query

Index

Constants

View Source
const (
	MAX_RETURNED   = 50
	MIN_SIMILARITY = -4.75
	AVG_DOC_LEN    = 4497
	INTERCEPT      = -4.75 // From logistic regression
)

Variables

View Source
var (

	//  From logistic regression
	WEIGHT = []float64{0.080, 2.327, 3.040} // [BM25 words, BM25 bigrams, bit vector]

)

Functions

This section is empty.

Types

type Collection

type Collection struct {
	GlossFile, Title string
}

type DictQueryParser

type DictQueryParser struct{ Tokenizer tokenizer.DictTokenizer }

func (DictQueryParser) ParseQuery

func (parser DictQueryParser) ParseQuery(query string) []TextSegment

The method for parsing the query text in this function is based on dictionary lookups

type Document

type Document struct {
	GlossFile, Title, CollectionFile, CollectionTitle, ContainsWords string
	ContainsBigrams                                                  string
	SimTitle, SimWords, SimBigram, SimBitVector, Similarity          float64
	ContainsTerms                                                    []string
	MatchDetails                                                     fulltext.MatchingText
}

func (Document) String

func (doc Document) String() string

For printing out retrieved document metadata

type QueryParser

type QueryParser interface {
	ParseQuery(query string) []TextSegment
}

Parses input queries into a slice of text segments

func MakeQueryParser

func MakeQueryParser(dict map[string]dicttypes.Word) QueryParser

Creates a QueryParser

type QueryResults

type QueryResults struct {
	Query, CollectionFile        string
	NumCollections, NumDocuments int
	Collections                  []Collection
	Documents                    []Document
	Terms                        []TextSegment
}

func FindDocuments

func FindDocuments(ctx context.Context,
	dictSearcher *dictionary.Searcher,
	parser QueryParser, query string,
	advanced bool) (QueryResults, error)

Returns a QueryResults object containing matching collections, documents, and dictionary words. For dictionary lookup, a text segment will contains the QueryText searched for and possibly a matching dictionary entry. There will only be matching dictionary entries for Chinese words in the dictionary. If there are no Chinese words in the query then the Chinese word senses matching the English or Pinyin will be included in the TextSegment.Senses field.

func FindDocumentsInCol

func FindDocumentsInCol(ctx context.Context,
	dictSearcher *dictionary.Searcher,
	parser QueryParser, query,
	col_gloss_file string) (QueryResults, error)

Returns a QueryResults object containing matching collections, documents, and dictionary words within a specific collecion. For dictionary lookup, a text segment will contains the QueryText searched for and possibly a matching dictionary entry. There will only be matching dictionary entries for Chinese words in the dictionary. If there are no Chinese words in the query then the Chinese word senses matching the English or Pinyin will be included in the TextSegment.Senses field.

type TextSegment

type TextSegment struct {
	QueryText string
	DictEntry dicttypes.Word
	Senses    []dicttypes.WordSense
}

A text segment contains the QueryText searched for and possibly a matching dictionary entry. There will only be matching dictionary entries for Chinese words in the dictionary. Non-Chinese text, punctuation, and unknown Chinese words will have nil DictEntry values and matching values will be included in the Senses field.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL