find

package
v0.0.54 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 14, 2021 License: Apache-2.0 Imports: 15 Imported by: 2

Documentation

Overview

Functions for finding documents by full text search

Functions for parsing a search query

Index

Constants

This section is empty.

Variables

View Source
var WEIGHT = []float64{0.080, 2.327, 3.040} // [BM25 words, BM25 bigrams, bit vector]

From logistic regression

Functions

This section is empty.

Types

type Collection

type Collection struct {
	GlossFile, Title string
}

type DatabaseDocFinder added in v0.0.17

type DatabaseDocFinder struct {
	// contains filtered or unexported fields
}

DatabaseDocFinder holds stateful items needed for text search in database.

func (DatabaseDocFinder) FindDocuments added in v0.0.17

func (df DatabaseDocFinder) FindDocuments(ctx context.Context,
	dictSearcher *dictionary.Searcher,
	parser QueryParser, query string,
	advanced bool) (*QueryResults, error)

FindDocuments returns a QueryResults object containing matching collections, documents, and dictionary words. For dictionary lookup, a text segment will contains the QueryText searched for and possibly a matching dictionary entry. There will only be matching dictionary entries for Chinese words in the dictionary. If there are no Chinese words in the query then the Chinese word senses matching the English or Pinyin will be included in the TextSegment.Senses field.

func (DatabaseDocFinder) FindDocumentsInCol added in v0.0.17

func (df DatabaseDocFinder) FindDocumentsInCol(ctx context.Context,
	dictSearcher *dictionary.Searcher, parser QueryParser, query,
	col_gloss_file string) (*QueryResults, error)

FindDocumentsInCol returns a QueryResults object containing matching collections, documents, and dictionary words within a specific collecion. For dictionary lookup, a text segment will contains the QueryText searched for and possibly a matching dictionary entry. There will only be matching dictionary entries for Chinese words in the dictionary. If there are no Chinese words in the query then the Chinese word senses matching the English or Pinyin will be included in the TextSegment.Senses field.

func (DatabaseDocFinder) GetColMap added in v0.0.17

func (df DatabaseDocFinder) GetColMap() map[string]string

func (DatabaseDocFinder) GetDocFileMap added in v0.0.17

func (df DatabaseDocFinder) GetDocFileMap() map[string]string

func (DatabaseDocFinder) GetDocMap added in v0.0.17

func (df DatabaseDocFinder) GetDocMap() map[string]Document

func (DatabaseDocFinder) Inititialized added in v0.0.17

func (df DatabaseDocFinder) Inititialized() bool

type DictQueryParser

type DictQueryParser struct{ Tokenizer tokenizer.DictTokenizer }

func (DictQueryParser) ParseQuery

func (parser DictQueryParser) ParseQuery(query string) []TextSegment

The method for parsing the query text in this function is based on dictionary lookups

type DocFinder added in v0.0.17

type DocFinder interface {
	FindDocuments(ctx context.Context, dictSearcher *dictionary.Searcher,
		parser QueryParser, query string, advanced bool) (*QueryResults, error)
	FindDocumentsInCol(ctx context.Context, dictSearcher *dictionary.Searcher,
		parser QueryParser, query, col_gloss_file string) (*QueryResults, error)
	GetColMap() map[string]string
	GetDocMap() map[string]Document
	GetDocFileMap() map[string]string
	Inititialized() bool
}

DocFinder finds documents.

func NewDocFinder added in v0.0.17

func NewDocFinder(ctx context.Context, database *sql.DB) DocFinder

type DocTitleFinder added in v0.0.52

type DocTitleFinder interface {
	FindDocuments(ctx context.Context, query string) (*QueryResults, error)
}

DocTitleFinder finds documents by title.

func NewDocTitleFinder added in v0.0.52

func NewDocTitleFinder(r io.Reader) DocTitleFinder

type Document

type Document struct {
	GlossFile, Title, CollectionFile, CollectionTitle, ContainsWords string
	ContainsBigrams                                                  string
	SimTitle, SimWords, SimBigram, SimBitVector, Similarity          float64
	ContainsTerms                                                    []string
	MatchDetails                                                     fulltext.MatchingText
	TitleCNMatch                                                     bool
}

func (Document) String

func (doc Document) String() string

For printing out retrieved document metadata

type QueryParser

type QueryParser interface {
	ParseQuery(query string) []TextSegment
}

Parses input queries into a slice of text segments

func MakeQueryParser

func MakeQueryParser(dict map[string]dicttypes.Word) QueryParser

Creates a QueryParser

type QueryResults

type QueryResults struct {
	Query, CollectionFile        string
	NumCollections, NumDocuments int
	Collections                  []Collection
	Documents                    []Document
	Terms                        []TextSegment
	SimilarTerms                 []TextSegment
}

type TextSegment

type TextSegment struct {
	QueryText string
	DictEntry dicttypes.Word
	Senses    []dicttypes.WordSense
}

A text segment contains the QueryText searched for and possibly a matching dictionary entry. There will only be matching dictionary entries for Chinese words in the dictionary. Non-Chinese text, punctuation, and unknown Chinese words will have nil DictEntry values and matching values will be included in the Senses field.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL