Documentation ¶
Index ¶
- func EncodeInvertedIndexKey(inKey []byte, lexeme string) []byte
- func EncodeInvertedIndexKeys(inKey []byte, vector TSVector) ([][]byte, error)
- func EncodeTSQuery(appendTo []byte, query TSQuery) ([]byte, error)
- func EncodeTSQueryPGBinary(appendTo []byte, query TSQuery) []byte
- func EncodeTSVector(appendTo []byte, vector TSVector) ([]byte, error)
- func EncodeTSVectorPGBinary(appendTo []byte, vector TSVector) ([]byte, error)
- func EvalTSQuery(q TSQuery, v TSVector) (bool, error)
- func GetConfigKey(config string) string
- func Rank(weights []float32, v TSVector, q TSQuery, method int) (float32, error)
- func TSLexize(config string, token string) (lexeme string, stopWord bool, err error)
- func TSParse(input string) []string
- func ValidConfig(input string) error
- type TSQuery
- func DecodeTSQuery(b []byte) (ret TSQuery, err error)
- func DecodeTSQueryPGBinary(b []byte) (ret TSQuery, err error)
- func ParseTSQuery(input string) (TSQuery, error)
- func PhraseToTSQuery(config string, input string) (TSQuery, error)
- func PlainToTSQuery(config string, input string) (TSQuery, error)
- func RandomTSQuery(rng *rand.Rand) TSQuery
- func ToTSQuery(config string, input string) (TSQuery, error)
- type TSVector
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func EncodeInvertedIndexKey ¶
EncodeInvertedIndexKey returns the inverted index key for the input lexeme.
func EncodeInvertedIndexKeys ¶
EncodeInvertedIndexKeys returns a slice of byte slices, one per inverted index key for the terms in this tsvector.
func EncodeTSQuery ¶
EncodeTSQuery encodes a tsquery into a serialized representation for on-disk storage.
func EncodeTSQueryPGBinary ¶
EncodeTSQueryPGBinary encodes a tsquery into a serialized representation.
The below comment explains the wire protocol representation. It is taken from this page: https://www.npgsql.org/dev/types.html
the tree written in prefix notation: First the number of tokens (a token is an operand or an operator). For each token: UInt8 type (1 = val, 2 = oper) followed by For val: UInt8 weight + UInt8 prefix (1 = yes / 0 = no) + null-terminated string, For oper: UInt8 oper (1 = not, 2 = and, 3 = or, 4 = phrase). In case of phrase oper code, an additional UInt16 field is sent (distance value of operator). Default is 1 for <->, otherwise the n value in '<n>'.
func EncodeTSVector ¶
EncodeTSVector encodes a tsvector into a serialized representation for on-disk storage.
func EncodeTSVectorPGBinary ¶
EncodeTSVectorPGBinary encodes a tsvector into a serialized representation that's identical to Postgres's wire protocol representation.
The below comment explains the wire protocol representation. It is taken from this page: https://www.npgsql.org/dev/types.html
tsvector:
UInt32 number of lexemes for each lexeme: lexeme text in client encoding, null-terminated UInt16 number of positions for each position: UInt16 WordEntryPos, where the most significant 2 bits is weight, and the 14 least significant bits is pos (can't be 0). Weights 3,2,1,0 represent A,B,C,D
func EvalTSQuery ¶
EvalTSQuery runs the provided TSQuery against the provided TSVector, returning whether or not the query matches the vector.
func GetConfigKey ¶
GetConfigKey returns a config that can be used as a key to look up stemmers and stopwords from an input config value. This is simulating the more advanced customizable dictionaries and configs that Postgres has, which allows user-defined text search configurations: because of this, configs can have schema prefixes. Because we don't (yet?) allow this, we just have to trim off any `pg_catalog.` prefix if it exists.
func Rank ¶
Rank implements the ts_rank functionality, which ranks a tsvector against a tsquery. The weights parameter is a list of weights corresponding to the tsvector lexeme weights D, C, B, and A. The method parameter is a bitmask defining different ranking behaviors, defined in the rankBehavior type above in this file. The default ranking behavior is 0, which doesn't perform any normalization based on the document length.
N.B.: this function is directly translated from the calc_rank function in tsrank.c, which contains almost no comments. As of this time, I am unable to sufficiently explain how this ranker works, but I'm confident that the implementation is at least compatible with Postgres. https://github.com/postgres/postgres/blob/765f5df726918bcdcfd16bcc5418e48663d1dd59/src/backend/utils/adt/tsrank.c#L357
func TSLexize ¶
TSLexize implements the "dictionary" construct that's exposed via ts_lexize. It gets invoked once per input token to produce an output lexeme during routines like to_tsvector and to_tsquery. It can return true in the second parameter to indicate a stopword was found.
func TSParse ¶
TSParse is the function that splits an input text into a list of tokens. For now, the parser that we use is very simple: it merely lowercases the input and splits it into tokens based on assuming that non-letter, non-number characters are whitespace.
The Postgres text search parser is much, much more sophisticated. The documentation (https://www.postgresql.org/docs/current/textsearch-parsers.html) gives more information, but roughly, each token is categorized into one of about 20 different buckets, such as asciiword, url, email, host, float, int, version, tag, etc. It uses very specific rules to produce these outputs. Another interesting transformation is returning multiple tokens for a hyphenated word, including a token that represents the entire hyphenated word, as well as one for each of the hyphenated components.
It's not clear whether we need to exactly mimic this functionality. Likely, we will eventually want to do this.
func ValidConfig ¶
ValidConfig returns an error if the input string is not a supported and valid text search config.
Types ¶
type TSQuery ¶
type TSQuery struct {
// contains filtered or unexported fields
}
TSQuery represents a tsNode AST root. A TSQuery is a tree of text search operators that can be run against a TSVector to produce a predicate of whether the query matched.
func DecodeTSQuery ¶
DecodeTSQuery deserializes a serialized TSQuery in on-disk format.
func DecodeTSQueryPGBinary ¶
DecodeTSQueryPGBinary deserializes a serialized TSQuery in pgwire format.
func ParseTSQuery ¶
ParseTSQuery produces a TSQuery from an input string.
func PhraseToTSQuery ¶
PhraseToTSQuery implements the phraseto_tsquery builtin, which lexes an input, performs stopwording and normalization on the tokens, and returns a parsed query, interposing the <-> operator between each token.
func PlainToTSQuery ¶
PlainToTSQuery implements the plainto_tsquery builtin, which lexes an input, performs stopwording and normalization on the tokens, and returns a parsed query, interposing the & operator between each token.
func RandomTSQuery ¶
RandomTSQuery returns a random TSQuery for testing.
func ToTSQuery ¶
ToTSQuery implements the to_tsquery builtin, which lexes an input, performs stopwording and normalization on the tokens, and returns a parsed query.
func (TSQuery) GetInvertedExpr ¶
func (q TSQuery) GetInvertedExpr() (expr inverted.Expression, err error)
GetInvertedExpr returns the inverted expression that can be used to search an index.
type TSVector ¶
type TSVector []tsTerm
TSVector is a sorted list of terms, each of which is a lexeme that might have an associated position within an original document.
func DecodeTSVector ¶
DecodeTSVector decodes a tsvector in disk-storage representation from the input byte slice.
func DecodeTSVectorPGBinary ¶
DecodeTSVectorPGBinary decodes a tsvector from the input byte slice which is formatted in Postgres binary protocol.
func DocumentToTSVector ¶
DocumentToTSVector parses an input document into lexemes, removes stop words, stems and normalizes the lexemes, and returns a TSVector annotated with lexeme positions according to a text search configuration passed by name.
func ParseTSVector ¶
ParseTSVector produces a TSVector from an input string. The input will be sorted by lexeme, but will not be automatically stemmed or stop-worded.
func RandomTSVector ¶
RandomTSVector returns a random TSVector for testing.
func (TSVector) StringSize ¶
StringSize returns the length of the string that would have been returned on String() call, without actually constructing that string.