Documentation ¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Article ¶
type Article struct { // Title is the heading that preceeds the article’s content, and the basis // for the article’s page name and URL. It indicates what the article is // about, and distinguishes it from other articles. The title may simply // be the name of the subject of the article, or it may be a description // of the topic. Title string // Byline is a printed line of text accompanying a news story, article, or // the like, giving the author’s name Byline string // Dir is the direction of the text in the article. // // Either Left-to-Right (LTR) or Right-to-Left (RTL). Dir string // Content is the relevant text in the article with HTML tags. Content string // TextContent is the relevant text in the article without HTML tags. TextContent string // Excerpt is the summary for the relevant text in the article. Excerpt string // SiteName is the name of the original publisher website. SiteName string // Favicon (short for favorite icon) is a file containing one or more small // icons, associated with a particular website or web page. A web designer // can create such an icon and upload it to a website (or web page) by // several means, and graphical web browsers will then make use of it. Favicon string // Image is an image URL which represents the article’s content. Image string // Length is the amount of characters in the article. Length int // Node is the first element in the HTML document. Node *html.Node }
Article represents the metadata and content of the article.
type Readability ¶
type Readability struct { // MaxElemsToParse is the optional maximum number of HTML nodes to parse // from the document. If the number of elements in the document is higher // than this number, the operation immediately errors. MaxElemsToParse int // NTopCandidates is the number of top candidates to consider when the // parser is analysing how tight the competition is among candidates. NTopCandidates int // CharThresholds is the default number of chars an article must have in // order to return a result. CharThresholds int // ClassesToPreserve are the classes that readability sets itself. ClassesToPreserve []string // TagsToScore is element tags to score by default. TagsToScore []string // contains filtered or unexported fields }
Readability is an HTML parser that reads and extract relevant content.
func New ¶
func New() *Readability
New returns new Readability with sane defaults to parse simple documents.
func (*Readability) IsReadable ¶
func (r *Readability) IsReadable(input io.Reader) bool
IsReadable decides whether the document is usable or not without parsing the whole thing. In the original `mozilla/readability` library, this method is located in `Readability-readable.js`.
Click to show internal directories.
Click to hide internal directories.