Documentation ¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
View Source
var ParseHTML model.ParseFunc = func(reader io.ReadCloser, options ...model.ParseOption) *model.ParseOutput { defer reader.Close() c := &model.ParseConfig{} for _, option := range options { option(c) } if c.Verbose { fmt.Println("--> parsing HTML...") } var err error var contents *htmlContents var parseFn parseFunc = parseHTML var tagWeights model.TagWeights if len(c.TagWeights) == 0 { tagWeights = defaultTagWeights } else { tagWeights = c.TagWeights } if c.FullSite && c.Source != "" { var crawler *webCrawler crawler, err = newWebCrawler(parseFn, tagWeights, c.Source, c.Verbose) if err != nil { return &model.ParseOutput{Err: err} } contents = crawler.run(reader) } else { contents = parseFn(reader, tagWeights, nil) } if err != nil { return &model.ParseOutput{Err: err} } if len(contents.lines) == 0 { return &model.ParseOutput{} } tags, title := tagifyHTML(contents, tagWeights, c.Verbose, c.NoStopWords, c.ContentOnly) return &model.ParseOutput{Tags: tags, DocTitle: title, DocHash: contents.hash()} }
ParseHTML receives lines of raw HTML markup text from the Web and returns simple text, plus list of prioritised tags (if tagify == true) based on the importance of HTML tags which wrap sentences.
Example:
<h1>A story about foo <p> Foo was a good guy but, had a quite poor time management skills, therefore he had issues with shipping all his tasks. Though foo had heaps of other amazing skills, which gained him a fortune.
Result:
foo: 2 + 1 = 3, story: 2, management: 1 + 1 = 2, skills: 1 + 1 = 2.
Returns a slice of tags as 1st result, a title of the page as 2nd and a version of the document based on the hashed contents as 3rd.
Functions ¶
This section is empty.
Types ¶
This section is empty.
Click to show internal directories.
Click to hide internal directories.