Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
View Source
var ( // EnStopWords 英文停词表 EnStopWords = map[string]struct{}{}/* 561 elements not displayed */ // ChStopWords 中文停词表 ChStopWords = map[string]struct{}{}/* 1513 elements not displayed */ // SpStopWords 特殊词停词表 SpStopWords = map[string]struct{}{}/* 250 elements not displayed */ )
Functions ¶
func Bytes2String ¶
Bytes2String fast type conversion from byte array to string, both share the same mem pointer.
func GetByteOrder ¶
Types ¶
type HashWeightPair ¶
type HashWeightPair struct {
// contains filtered or unexported fields
}
type LanguageType ¶
type LanguageType int8
const ( ENGLISH LanguageType = 0 CHINESE LanguageType = 1 )
type SimHash ¶
type SimHash struct {
// contains filtered or unexported fields
}
SimHash implements the Standard-Cuckoo-Filter mentioned by "Detecting Near-Duplicates for Web Crawling".
func NewSimHash ¶
func NewSimHash(language LanguageType, dict string) *SimHash
func (*SimHash) Fingerprint ¶
func (*SimHash) FingerprintToString ¶
Click to show internal directories.
Click to hide internal directories.