processor

package
v0.11.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 6, 2018 License: Apache-2.0 Imports: 8 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Normalize added in v0.8.0

func Normalize(word string, filterByStopWords bool) (string, bool)

Normalize sanitizes word and tells whether it is allowed token or not.

func ToStrings added in v0.8.0

func ToStrings(items []*Tag) []string

ToStrings ...

Types

type Tag

type Tag struct {
	Value string
	Score float64
	Count int
}

Tag holds some arbitrary string value (e.g. a word) along with some extra data about it.

func ParseHTML

func ParseHTML(lines []string, verbose, doFiltering bool) ([]*Tag, []string)

ParseHTML receives lines of raw strings from the Web and produces result of prioritised tags based on the importance of HTML tags which wrap sentences.

Example:

<h1>A story about foo
<p> Foo was a good guy but, had a quite poor time management skills,
therefore he had issues with shipping all his tasks. Though foo had heaps
of other amazing skills, which gained him a fortune.

Result:

foo: 2 + 1 = 3, story: 2, management: 1 + 1 = 2, skills: 1 + 1 = 2.

func ParseText

func ParseText(lines []string, filterByStopWords bool) []*Tag

ParseText ...

func Run

func Run(items []*Tag, limit int) []*Tag

Run - 1st sorts given list, then iterates over it and de-dupes items in the list by merging inflections, then sorts de-duped list again and takes only requested size (limit) or just everything if result is smaller than limit.

nolint: gocyclo

func (*Tag) String

func (t *Tag) String() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL