Documentation ¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type ArticleExtractor ¶
type ArticleExtractor struct {
// contains filtered or unexported fields
}
func NewArticleExtractor ¶
func NewArticleExtractor(logger logutil.Logger) *ArticleExtractor
func (*ArticleExtractor) Extract ¶
func (ae *ArticleExtractor) Extract(doc *webdoc.TextDocument, wc stringutil.WordCounter, candidateTitles []string) bool
Extract extracts TextDocument. It is tuned towards news articles.
type ContentExtractor ¶
type ContentExtractor struct { Parser *markup.Parser TimingInfo *data.TimingInfo ImageURLs []string WordCounter stringutil.WordCounter // contains filtered or unexported fields }
func NewContentExtractor ¶
func (*ContentExtractor) ExtractContent ¶
func (ce *ContentExtractor) ExtractContent() (*webdoc.Document, int)
func (*ContentExtractor) ExtractTitle ¶
func (ce *ContentExtractor) ExtractTitle() string
Click to show internal directories.
Click to hide internal directories.