processor

package
v0.35.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 18, 2020 License: Apache-2.0 Imports: 11 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func SplitToSentences added in v0.24.0

func SplitToSentences(text string) []string

SplitToSentences splits given text into slice of sentences.

func ToStrings added in v0.8.0

func ToStrings(items []*Tag) []string

ToStrings transforms list of given tags into a list of strings.

Types

type InputReader added in v0.22.0

type InputReader interface {
	ReadLines() ([]string, error)
}

InputReader ...

type Tag

type Tag struct {
	// Value of the tag, i.e. a word
	Value string
	// Score used to represent importance of the tag
	Score float64
	// Count is the number of times tag appeared in a text
	Count int
	// Docs is the number of documents in a text in which the tag appeared
	Docs int
	// DocsCount is the number of documents in a text
	DocsCount int
}

Tag holds some arbitrary string value (e.g. a word) along with some extra data about it.

func ParseHTML

func ParseHTML(reader io.ReadCloser, verbose, noStopWords bool) ([]*Tag, string, []byte)

ParseHTML receives lines of raw HTML markup text from the Web and returns simple text, plus list of prioritised tags (if tagify == true) based on the importance of HTML tags which wrap sentences.

Example:

<h1>A story about foo
<p> Foo was a good guy but, had a quite poor time management skills,
therefore he had issues with shipping all his tasks. Though foo had heaps
of other amazing skills, which gained him a fortune.

Result:

foo: 2 + 1 = 3, story: 2, management: 1 + 1 = 2, skills: 1 + 1 = 2.

Returns a slice of tags as 1st result, a title of the page as 2nd and a version of the document based on the hashed contents as 3rd.

func ParseText

func ParseText(in InputReader, verbose, noStopWords bool) ([]*Tag, []byte)

ParseText parses given text lines of text into a slice of tags.

func Run

func Run(items []*Tag, limit int) []*Tag

Run - 1st sorts given list, then iterates over it and de-dupes items in the list by merging inflections, then sorts de-duped list again and takes only requested size (limit) or just everything if result is smaller than limit.

nolint: gocyclo

func (*Tag) String

func (t *Tag) String() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL