find

package
v0.0.94 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 1, 2022 License: Apache-2.0 Imports: 15 Imported by: 2

Documentation

Overview

Functions for finding documents by full text search

Functions for parsing a search query

Index

Constants

This section is empty.

Variables

View Source
var WEIGHT = []float64{0.080, 2.327, 3.040} // [BM25 words, BM25 bigrams, bit vector]

From logistic regression

Functions

func LoadDocInfo added in v0.0.60

func LoadDocInfo(r io.Reader) (map[string]DocInfo, map[string]DocInfo)

Load title info for all documents

Types

type Collection

type Collection struct {
	GlossFile, Title string
}

type DictQueryParser

type DictQueryParser struct{ Tokenizer tokenizer.DictTokenizer }

func (DictQueryParser) ParseQuery

func (parser DictQueryParser) ParseQuery(query string) []TextSegment

The method for parsing the query text in this function is based on dictionary lookups

type DocFinder added in v0.0.17

type DocFinder interface {
	FindDocuments(ctx context.Context, dictSearcher dictionary.ReverseIndex,
		parser QueryParser, query string, advanced bool) (*QueryResults, error)
	FindDocumentsInCol(ctx context.Context, dictSearcher dictionary.ReverseIndex,
		parser QueryParser, query, col_gloss_file string) (*QueryResults, error)
	GetColMap() map[string]string
	Inititialized() bool
}

DocFinder finds documents.

func NewDocFinder added in v0.0.17

func NewDocFinder(ctx context.Context,
	database *sql.DB,
	docMap map[string]DocInfo) DocFinder

Create and initialize an implementation of the DocFinder interface

type DocInfo added in v0.0.60

type DocInfo struct {
	CorpusFile, GlossFile, Title, TitleCN, TitleEN, CollectionFile, CollectionTitle string
}

type DocTitleFinder added in v0.0.52

type DocTitleFinder interface {
	FindDocuments(ctx context.Context, query string) (*QueryResults, error)
}

DocTitleFinder finds documents by title.

func NewDocTitleFinder added in v0.0.52

func NewDocTitleFinder(infoCache map[string]DocInfo) DocTitleFinder

NewDocTitleFinder initializes a DocTitleFinder implementation Params

infoCache: key to the map is the Chinese part of the title

type Document

type Document struct {
	GlossFile, Title, CollectionFile, CollectionTitle, ContainsWords string
	ContainsBigrams                                                  string
	SimTitle, SimWords, SimBigram, SimBitVector, Similarity          float64
	ContainsTerms                                                    []string
	MatchDetails                                                     fulltext.MatchingText
	TitleCNMatch                                                     bool
}

func (Document) String

func (doc Document) String() string

For printing out retrieved document metadata

type QueryParser

type QueryParser interface {
	ParseQuery(query string) []TextSegment
}

Parses input queries into a slice of text segments

func MakeQueryParser

func MakeQueryParser(dict map[string]*dicttypes.Word) QueryParser

Creates a QueryParser

type QueryResults

type QueryResults struct {
	Query, CollectionFile        string
	NumCollections, NumDocuments int
	Collections                  []Collection
	Documents                    []Document
	Terms                        []TextSegment
	SimilarTerms                 []TextSegment
}

type TextSegment

type TextSegment struct {
	QueryText string
	DictEntry dicttypes.Word
	Senses    []dicttypes.WordSense
}

A text segment contains the QueryText searched for and possibly a matching dictionary entry. There will only be matching dictionary entries for Chinese words in the dictionary. Non-Chinese text, punctuation, and unknown Chinese words will have nil DictEntry values and matching values will be included in the Senses field.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL