Documentation ¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
View Source
var ParseHTML model.ParseFunc = func(c *config.Config, reader io.ReadCloser, options ...model.ParseOption) *model.ParseOutput { defer reader.Close() pc := &model.ParseConfig{} for _, option := range options { option(pc) } if c.Verbose { fmt.Println("--> parsing HTML...") } var err error var contents *htmlContents var parseFn parseFunc = parseHTML var tagWeights model.TagWeights if c.TagWeights == "" { tagWeights = defaultTagWeights } else { tagWeights = pc.TagWeights } if c.FullSite && c.Source != "" { var crawler *webCrawler crawler, err = newWebCrawler(parseFn, tagWeights, c.Source, c.Verbose) if err != nil { return &model.ParseOutput{Err: err} } contents = crawler.run(reader) } else { contents = parseFn(reader, tagWeights, nil) } if c.Verbose { fmt.Println("--> parsed") } if err != nil { return &model.ParseOutput{Err: err} } if len(contents.lines) == 0 { return &model.ParseOutput{} } tags, title, lang := tagifyHTML(contents, c, tagWeights) return &model.ParseOutput{Tags: tags, DocTitle: title, DocHash: contents.hash(), Lang: lang} }
ParseHTML receives lines of raw HTML markup text from the Web and returns simple text, plus list of prioritised tags (if tagify == true) based on the importance of HTML tags which wrap sentences.
Example:
<h1>A story about foo <p> Foo was a good guy but, had a quite poor time management skills, therefore he had issues with shipping all his tasks. Though foo had heaps of other amazing skills, which gained him a fortune.
Result:
foo: 2 + 1 = 3, story: 2, management: 1 + 1 = 2, skills: 1 + 1 = 2.
Returns a slice of tags as 1st result, a title of the page as 2nd and a version of the document based on the hashed contents as 3rd.
Functions ¶
This section is empty.
Types ¶
This section is empty.
Click to show internal directories.
Click to hide internal directories.