Documentation ¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type NumWordsRulesClassifier ¶
type NumWordsRulesClassifier struct{}
NumWordsRulesClassifier classifies several TextBlock as content or not-content through rules that have been determined using the C4.8 machine learning algorithm, as described in the paper "Boilerplate Detection using Shallow Text Features" (WSDM 2010), particularly using number of words per block and link density per block.
func NewNumWordsRulesClassifier ¶
func NewNumWordsRulesClassifier() *NumWordsRulesClassifier
func (*NumWordsRulesClassifier) Process ¶
func (f *NumWordsRulesClassifier) Process(doc *webdoc.TextDocument) bool
type TerminatingBlocksFinder ¶
type TerminatingBlocksFinder struct{}
TerminatingBlocksFinder finds blocks which are potentially indicating the end of an article text and marks them with label.StrictlyNotContent.
func NewTerminatingBlocksFinder ¶
func NewTerminatingBlocksFinder() *TerminatingBlocksFinder
func (*TerminatingBlocksFinder) Process ¶
func (f *TerminatingBlocksFinder) Process(doc *webdoc.TextDocument) bool
Click to show internal directories.
Click to hide internal directories.