Documentation
¶
Index ¶
- Constants
- func Calculate(ranks *Rank, algorithm Algorithm)
- type Algorithm
- type AlgorithmChain
- type AlgorithmDefault
- type Phrase
- type Rank
- func (rank *Rank) AddNewWord(word string, prevWordIdx int, sentenceID int) (wordID int)
- func (rank *Rank) GetWordData() map[int]*Word
- func (rank *Rank) IsWordExist(word string) bool
- func (rank *Rank) UpdateRightConnection(wordID int, rightWordID int)
- func (rank *Rank) UpdateWord(word string, prevWordIdx int, sentenceID int) (wordID int)
- type Relation
- type Score
- type Sentence
- type SingleWord
- type Word
Constants ¶
const ByQty = 0
ByQty filter by occurrence of word.
const ByRelation = 1
ByRelation filter by phrase weight.
Variables ¶
This section is empty.
Functions ¶
Types ¶
type Algorithm ¶
type Algorithm interface { WeightingRelation( word1ID int, word2ID int, rank *Rank, ) float32 WeightingHits( wordID int, rank *Rank, ) float32 }
Algorithm interface and its methods make possible the polimorf usage of weighting process.
type AlgorithmChain ¶
type AlgorithmChain struct{}
AlgorithmChain struct is the combined implementation of Algorithm. It is a good example how weighting can be changed by a different implementations. It can weight a word or phrase by comparing them.
func NewAlgorithmChain ¶
func NewAlgorithmChain() *AlgorithmChain
NewAlgorithmChain constructor retrieves an AlgorithmChain pointer.
func (*AlgorithmChain) WeightingHits ¶
func (a *AlgorithmChain) WeightingHits( wordID int, rank *Rank, ) float32
WeightingHits method ranks the words by their occurrence.
func (*AlgorithmChain) WeightingRelation ¶
func (a *AlgorithmChain) WeightingRelation( word1ID int, word2ID int, rank *Rank, ) float32
WeightingRelation method is a combined algorithm of text rank and word occurrence, it weights a phrase.
type AlgorithmDefault ¶
type AlgorithmDefault struct{}
AlgorithmDefault struct is the basic implementation of Algorithm. It can weight a word or phrase by comparing them.
func NewAlgorithmDefault ¶
func NewAlgorithmDefault() *AlgorithmDefault
NewAlgorithmDefault constructor retrieves an AlgorithmDefault pointer.
func (*AlgorithmDefault) WeightingHits ¶
func (a *AlgorithmDefault) WeightingHits( wordID int, rank *Rank, ) float32
WeightingHits method ranks the words by their occurrence.
func (*AlgorithmDefault) WeightingRelation ¶
func (a *AlgorithmDefault) WeightingRelation( word1ID int, word2ID int, rank *Rank, ) float32
WeightingRelation method is the traditional algorithm of text rank to weighting a phrase.
type Phrase ¶
Phrase struct contains a single phrase and its data.
LeftID is the ID of the word 1.
RightID is the ID of the word 2.
Left is the token of the word 1.
Right is the token of the word 2.
Weight is between 0.00 and 1.00.
Qty is the occurrence of the phrase.
func FindPhrases ¶
FindPhrases function has wrapper textrank.FindPhrases. Use the wrapper instead.
type Rank ¶
type Rank struct { Max float32 Min float32 Relation Relation SentenceMap map[int]string Words map[int]*Word WordValID map[string]int }
Rank struct contains every original raw sentences, words, tokens, phrases, indexes, word hits, phrase hits and minimum-maximum values.
Max is the occurrence of the most used word.
Min is the occurrence of the less used word. It is always greater then 0.
Relation is the Relation object, contains phrases.
SentenceMap contains raw sentences. Index is the sentence ID, value is the sentence itself.
Words contains Word objects. Index is the word ID, value is the word/token itself.
WordValID contains words. Index is the word/token, value is the ID.
func (*Rank) AddNewWord ¶
AddNewWord method adds a new word to the rank object and it defines its ID.
func (*Rank) GetWordData ¶
GetWordData method retrieves all words as a pointer.
func (*Rank) IsWordExist ¶
IsWordExist method retrieves true when the given word is already in the rank.
func (*Rank) UpdateRightConnection ¶
UpdateRightConnection method adds the right connection to the word. It always can be used after a word has added and the next word is known.
type Relation ¶
Relation struct contains the phrase data.
Max is the occurrence of the most used phrase.
Min is the occurrence of the less used phrase. It is always greater then 0.
Node is contains the Scores. Firs ID is the word 1, second ID is the word 2, and the value is the Score what contains the data about their relation.
type Score ¶
Score struct contains data about a relation of two words.
Qty is the occurrence of the phrase.
Weight is the weight of the phrase between 0.00 and 1.00.
SentenceIDs contains all IDs of sentences what contain the phrase.
type Sentence ¶
Sentence struct contains a single sentence and its data.
func FindSentences ¶
FindSentences function has wrappers textrank.FindSentencesByRelationWeight and textrank.FindSentencesByWordQtyWeight. Use the wrappers instead.
func FindSentencesByPhrases ¶
FindSentencesByPhrases function has wrapper textrank.FindSentencesByPhraseChain. Use the wrapper instead.
type SingleWord ¶
SingleWord struct contains a single word and its data.
ID of the word.
Word itself, the token.
Weight of the word between 0.00 and 1.00.
Quantity of the word.
func FindSingleWords ¶
func FindSingleWords(ranks *Rank) []SingleWord
FindSingleWords function has wrapper textrank.FindSingleWords. Use the wrapper instead.
type Word ¶
type Word struct { ID int SentenceIDs []int ConnectionLeft map[int]int ConnectionRight map[int]int Token string Qty int Weight float32 }
Word struct contains all data about the words.
If a word is multiple times in the text then the multiple words point to the same ID. So Word is unique.
SentenceIDs contains all IDs of sentences what contain the word.
ConnectionLeft contains all words what are connected to this word on the left side. The map index is the ID of the related word and its value is the occurrence.
ConnectionRight contains all words what are connected to this word on the right side. The map index is the ID of the related word and its value is the occurrence.
Token is the word itself, but not the original, it is tokenized.
Qty is the number of occurrence of the word.
Weight is the weight of the word between 0.00 and 1.00.