processor

package
v0.18.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 14, 2019 License: Apache-2.0 Imports: 8 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Normalize added in v0.8.0

func Normalize(word string, noStopWords bool) (string, bool)

Normalize sanitizes word and tells whether it is allowed token or not.

func ToStrings added in v0.8.0

func ToStrings(items []*Tag) []string

ToStrings transforms list of given tags into a list of strings.

Types

type Tag

type Tag struct {
	Value string
	Score float64
	Count int
}

Tag holds some arbitrary string value (e.g. a word) along with some extra data about it.

func ParseHTML

func ParseHTML(html []string, verbose, noStopWords bool) []*Tag

ParseHTML receives lines of raw HTML markup text from the Web and returns simple text, plus list of prioritised tags (if tagify == true) based on the importance of HTML tags which wrap sentences.

Example:

<h1>A story about foo
<p> Foo was a good guy but, had a quite poor time management skills,
therefore he had issues with shipping all his tasks. Though foo had heaps
of other amazing skills, which gained him a fortune.

Result:

foo: 2 + 1 = 3, story: 2, management: 1 + 1 = 2, skills: 1 + 1 = 2.

func ParseText

func ParseText(text []string, noStopWords bool) []*Tag

ParseText parses given text lines of text into a slice of tags.

func Run

func Run(items []*Tag, limit int) []*Tag

Run - 1st sorts given list, then iterates over it and de-dupes items in the list by merging inflections, then sorts de-duped list again and takes only requested size (limit) or just everything if result is smaller than limit.

nolint: gocyclo

func (*Tag) String

func (t *Tag) String() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL