model

package

v0.0.0-...-0906917 Latest Latest Go to latest Published: Aug 6, 2021 License: MIT Imports: 4 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

bitbucket.org/lexic-project/news-extractor-api

Documentation ¶

Index ¶

Variables
type Extractor
- func NewExtractor() *Extractor
- func (ext *Extractor) Extract(doc *html.Document) (*util.Article, error)

Constants ¶

This section is empty.

Variables ¶

View Source

var (
	ErrNoChunks    = errors.New("document contains no chunks")
	ErrEmptyResult = errors.New("nothing found")
)

Functions ¶

This section is empty.

Types ¶

type Extractor ¶

type Extractor struct {
	Labels []bool
}

Extractor utilizes the trained model to extract relevant html.Chunks from an html.Document.

func NewExtractor ¶

func NewExtractor() *Extractor

NewExtractor creates and initializes a new Extractor.

func (*Extractor) Extract ¶

func (ext *Extractor) Extract(doc *html.Document) (*util.Article, error)

Extract returns a list of relevant text chunks found in doc.

How it works ¶

This function creates a feature vector for each chunk found in doc. A feature vector contains a numerical representation of the chunk's properties like HTML element type, parent element type, number of words, number of sentences and stuff like this.

A logistic regression model is used to calculate scores based on these feature vectors. Then, in some kind of meta / ensemble learning approach, a second type of feature vector is created based on these scores. This feature vector is fed to our random forest and finally the random forest's predictions are used to generate the result.

By now you might have noticed that I'm exceptionally bad at naming and describing things properly.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL