Documentation ¶
Index ¶
- func GetDoubleMetaphone(document string, dc TextCleanserDecorator) []string
- func GetWords(document string, dc TextCleanserDecorator) []string
- func Ident(s string) string
- type Doc2Words
- type TextCleanser
- func Compact(c TextCleanser) TextCleanser
- func RemovePunctuation(c TextCleanser) TextCleanser
- func SplitCamelCase(c TextCleanser) TextCleanser
- func SplitCamelCaseUnicode(c TextCleanser) TextCleanser
- func ToDoubleMetaphone(c TextCleanser) TextCleanser
- func ToLower(c TextCleanser) TextCleanser
- func Trim(c TextCleanser) TextCleanser
- type TextCleanserDecorator
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func GetDoubleMetaphone ¶
func GetDoubleMetaphone(document string, dc TextCleanserDecorator) []string
func GetWords ¶
func GetWords(document string, dc TextCleanserDecorator) []string
Types ¶
type Doc2Words ¶
Doc2Words defines the function type for doc to words
Example ¶
for standalone test, change package to `main` and the next func def to, func main() {
//package main package main import ( "fmt" "github.com/go-dedup/text" ) var Doc2words = text.GetWordsFactory(text.Decorators( text.SplitCamelCase, text.ToLower, text.RemovePunctuation, text.Compact, text.Trim, )) // for standalone test, change package to `main` and the next func def to, // func main() { func main() { for _, d := range testDoc { fmt.Printf("%v\n", Doc2words(string(d))) } } var testDoc = [][]byte{ []byte("(ebook) GNU - PYTHON Standard Library with myConstantVariable (2001)"), // } // var testDoc2 = [][]byte{ []byte("Ford F-150. Lariat DO NOT BUY. Truck has been in the shop 50 days so far. It has had a vibration since day one and Ford cannot get rid of it. The have done everything possible to the underside of this truck and it is… 11,000km | Automatic"), []byte("2016 Ford Mustang 2016 Ford Mustang white with black stripes, this car is in showroom shape and it only has 14,000kms. this beast has never been in an accident nor does it have one scratch on the body. i purchased 20… 14,000km | Automatic"), []byte("2013 Ford Fiesta Sedan - 22,116 kms Body is in perfect condition. No mechanical problems. Oil change and maintenance package done in March/17. Registered inspection done in April/16. $10,000 firm (sales tax is extra). Call … 22,120km | Automatic"), []byte("2015 Ford Explorer Sport SUV, Crossover This vehicle is a real beauty and a pleasure to drive. It is in excellent condition and has been store inside since purchased in 2015. It has not been driven in winter other then to go for service.!… 18,600km | Automatic"), []byte("2013 Ford Fiesta Sedan - 22,116 kms Body is in perfect condition. No mechanical problems. Oil change and maintenance package done in March/17. Registered inspection done in April/16. $10,000 firm (sales tax is extra). Call … 22,120km | Automatic"), []byte("2015 Ford Explorer Sport SUV, Crossover This vehicle is a real beauty and a pleasure to drive. It is in excellent condition and has been store inside since purchased in 2015. It has not been driven in winter other then to go for service.!… 18,600km | Automatic"), []byte("Ford F-150. Lariat DO NOT BUY. Truck has been in the shop 50 days so far. It has had a vibration since day one and Ford cannot get rid of it. The have done everything possible to the underside of this truck and it is… 11,000km | Automatic"), []byte("2016 Ford Mustang 2016 Ford Mustang white with black stripes, this car is in showroom shape and it only has 14,000kms. this beast has never been in an accident nor does it have one scratch on the body. i purchased 20… 14,000km | Automatic"), }
Output: [ebook gnu python standard library with my constant variable 2001] [ford f 150 lariat do not buy truck has been in the shop 50 days so far it has had a vibration since day one and ford cannot get rid of it the have done everything possible to the underside of this truck and it is 11000km automatic] [2016 ford mustang 2016 ford mustang white with black stripes this car is in showroom shape and it only has 14000kms this beast has never been in an accident nor does it have one scratch on the body i purchased 20 14000km automatic] [2013 ford fiesta sedan 22116 kms body is in perfect condition no mechanical problems oil change and maintenance package done in march 17 registered inspection done in april 16 $10000 firm sales tax is extra call 22120km automatic] [2015 ford explorer sport suv crossover this vehicle is a real beauty and a pleasure to drive it is in excellent condition and has been store inside since purchased in 2015 it has not been driven in winter other then to go for service 18600km automatic] [2013 ford fiesta sedan 22116 kms body is in perfect condition no mechanical problems oil change and maintenance package done in march 17 registered inspection done in april 16 $10000 firm sales tax is extra call 22120km automatic] [2015 ford explorer sport suv crossover this vehicle is a real beauty and a pleasure to drive it is in excellent condition and has been store inside since purchased in 2015 it has not been driven in winter other then to go for service 18600km automatic] [ford f 150 lariat do not buy truck has been in the shop 50 days so far it has had a vibration since day one and ford cannot get rid of it the have done everything possible to the underside of this truck and it is 11000km automatic] [2016 ford mustang 2016 ford mustang white with black stripes this car is in showroom shape and it only has 14000kms this beast has never been in an accident nor does it have one scratch on the body i purchased 20 14000km automatic]
func GetDoubleMetaphoneFactory ¶
func GetDoubleMetaphoneFactory(dc TextCleanserDecorator) Doc2Words
func GetWordsFactory ¶
func GetWordsFactory(dc TextCleanserDecorator) Doc2Words
type TextCleanser ¶
TextCleanser defines the function type for text cleansing
Example ¶
for standalone test, change package to `main` and the next func def to, func main() {
package main import ( "fmt" "github.com/go-dedup/text" ) // for standalone test, change package to `main` and the next func def to, // func main() { func main() { s := "Hello~~, play_ground#5!" var fn text.TextCleanser = text.Ident fmt.Println(fn(s)) var fn2 = text.ToLower(fn) fmt.Println(fn2(s)) var fn3 text.TextCleanser = text.Ident fn3 = text.ToAppend(" -GOLANG")(text.ToLower(text.ToPrepend("DECORATED: ")(fn3))) fmt.Println(fn3(s)) // dec is now a text.TextCleanserDecorator, to use it, you still need to // pass it the function of type text.TextCleanser that you want to decorate. dec := text.Decorators( text.ToAppend(" -GOLANG"), text.SplitCamelCase, text.ToLower, text.ToPrepend("DECORATED: "), text.RemovePunctuation, ) fn4 := dec(text.Ident) fmt.Println(fn4(s)) s += "\n.\n%% Something extra: UpperCamelCase and someInitMethod.\n" fmt.Printf(".\n>>>>\n'%s'\n", s) fmt.Printf("%#v\n", text.GetWords(s, dec)) dec = text.Decorators( dec, text.Compact, ) fmt.Printf("%#v\n", text.GetWords(s, dec)) fn5 := text.GetWordsFactory(dec) fmt.Printf("%#v\n", fn5(s)) s = "Andrej cabrillo Gallegos Germany Jankelowicz" fmt.Printf(".\n>>>>\n'%s'\n", s) dec = text.Decorators( text.ToDoubleMetaphone, ) fmt.Printf("%#v\n", text.GetWords(s, dec)) fmt.Printf("%#v\n", text.GetDoubleMetaphone(s, text.Decorators())) dec = text.Decorators( text.SplitCamelCase, text.Compact, ) fn5 = text.GetDoubleMetaphoneFactory(dec) fmt.Printf("%#v\n", fn5(s)) s = "NãoMeFazMal ÇaNeMeFaitPasMal PòssoMangiâFàMâ" fmt.Printf(".\n>>>>\n'%s'\n", s) dec = text.Decorators( text.SplitCamelCaseUnicode, ) fmt.Printf("%#v\n", text.GetWords(s, dec)) } // to show the full code in GoDoc type dummy struct { }
Output: Hello~~, play_ground#5! hello~~, play_ground#5! DECORATED: hello~~, play_ground#5! -golang DECORATED hello play ground 5 golang . >>>> 'Hello~~, play_ground#5! . %% Something extra: UpperCamelCase and someInitMethod. ' []string{"DECORATED", "hello", "", "", "play", "ground", "5", "", "", "", "", "", "", "something", "extra", "upper", "camel", "case", "and", "some", "init", "method", "", "", "", "golang"} []string{"DECORATED", "hello", "play", "ground", "5", "something", "extra", "upper", "camel", "case", "and", "some", "init", "method", "golang"} []string{"DECORATED", "hello", "play", "ground", "5", "something", "extra", "upper", "camel", "case", "and", "some", "init", "method", "golang"} . >>>> 'Andrej cabrillo Gallegos Germany Jankelowicz' []string{"antrjkprlklkskrmnjnklts", "antrkprkksjrmnanklfx"} []string{"antrj", "antr", "kprl", "kpr", "klks", "kks", "krmn", "jrmn", "jnklts", "anklfx"} []string{"antrj", "antr", "kprl", "kpr", "klks", "kks", "krmn", "jrmn", "jnklts", "anklfx"} . >>>> 'NãoMeFazMal ÇaNeMeFaitPasMal PòssoMangiâFàMâ' []string{"Não", "Me", "Faz", "Mal", "Ça", "Ne", "Me", "Fait", "Pas", "Mal", "Pòsso", "Mangiâ", "Fà", "Mâ"}
func Compact ¶
func Compact(c TextCleanser) TextCleanser
Compact cleanse all consecutive punctuations into a single space
func RemovePunctuation ¶
func RemovePunctuation(c TextCleanser) TextCleanser
RemovePunctuation cleanse all punctuations from the text
func SplitCamelCase ¶
func SplitCamelCase(c TextCleanser) TextCleanser
SplitCamelCase split each CamelCase word in the text to individual words
func SplitCamelCaseUnicode ¶
func SplitCamelCaseUnicode(c TextCleanser) TextCleanser
SplitCamelCaseUnicode split each CamelCase word in the text to individual words, unicode aware.
func ToDoubleMetaphone ¶
func ToDoubleMetaphone(c TextCleanser) TextCleanser
ToDoubleMetaphone transforms the text to DoubleMetaphones
type TextCleanserDecorator ¶
type TextCleanserDecorator func(TextCleanser) TextCleanser
TextCleanserDecorator is the text cleansing function Decorator
func Decorators ¶
func Decorators(ds ...TextCleanserDecorator) TextCleanserDecorator
Decorators "merges" the passed in decorators and returns a singe decorator.
func ToAppend ¶
func ToAppend(suffix string) TextCleanserDecorator
ToAppend manipulates the text by appending a suffix
func ToPrepend ¶
func ToPrepend(prefix string) TextCleanserDecorator
ToPrepend manipulates the text by pre-pending with a prefix