Documentation ¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func NewHTMLCommand ¶
Types ¶
type HTMLSplitParser ¶
type HTMLSplitParser struct {
// contains filtered or unexported fields
}
HTMLSplitParser is a GlazeProcessor that splits an HTML document into sections. When encountering one of the tags in splitTags, it extracts the content below the tag as Title (if extractTitle is true) and the following siblings until the next split tag is encountered as body.
func NewHTMLHeadingSplitParser ¶
func NewHTMLHeadingSplitParser(gp middlewares.Processor, removeTags []string) *HTMLSplitParser
NewHTMLHeadingSplitParser creates a new HTMLSplitParser that splits the document into sections and keeps the titles, by splitting at h1, h2, h3...
func NewHTMLSplitParser ¶
func NewHTMLSplitParser(gp middlewares.Processor, removeTags, splitTags []string, extractTitle bool) *HTMLSplitParser
func (*HTMLSplitParser) ProcessNode ¶
ProcessNode extracts the content below a header tag and sends it to the GlazeProcessor. It extracts the header tag content as Title, and the following siblings until the next header tag is encountered as body.
It returns the next node to be parsed (because we need to split a certain amount of sibling nodes).