Documentation ¶
Index ¶
- func IsProbablyReaderable(htmlSource string, opts ...Option) bool
- type Option
- func AllowedVideoRegex(rgx *regexp.Regexp) Option
- func CharThreshold(n int) Option
- func ClassesToPreserve(classes ...string) Option
- func DisableJSONLD(b bool) Option
- func KeepClasses(b bool) Option
- func LogLevel(l slog.Level) Option
- func MaxElemsToParse(n int) Option
- func MinContentLength(len int) Option
- func MinScore(score float64) Option
- func NTopCandidates(n int) Option
- func Serializer(f func(*node) string) Option
- func VisibilityChecker(f func(*html.Node) bool) Option
- type Options
- type Readability
- type Result
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func IsProbablyReaderable ¶
Decides whether or not the document is reader-able without parsing the whole thing. Options:
- options.minContentLength (default 140), the minimum node content length used to decide if the document is readerable
- options.minScore (default 20), the minumum cumulated 'score' used to determine if the document is readerable
- options.visibilityChecker (default isNodeVisible), the function used to determine if a node is visible
Types ¶
type Option ¶
type Option func(*Options)
func AllowedVideoRegex ¶
func CharThreshold ¶
func ClassesToPreserve ¶
func DisableJSONLD ¶
func KeepClasses ¶
func MaxElemsToParse ¶
func MinContentLength ¶
func NTopCandidates ¶
func Serializer ¶
type Readability ¶
type Readability struct {
// contains filtered or unexported fields
}
func New ¶
func New(htmlSource, uri string, opts ...Option) (*Readability, error)
New is the public constructor of Readability and it supports the following options:
- options.debug
- options.maxElemsToParse
- options.nbTopCandidates
- options.charThreshold
- this.classesToPreseve
- options.keepClasses
- options.serializer
func (*Readability) Parse ¶
func (r *Readability) Parse() (*Result, error)
Runs readability. Workflow:
- Prep the document by removing script tags, css, etc.
- Build readability's DOM tree.
- Grab the article content from the current dom tree.
- Replace the current DOM tree with the new one.
- Read peacefully.
type Result ¶
type Result struct { // article title Title string // HTML string of processed article Content Content string // text content of the article, with all the HTML tags removed TextContent string // length of an article, in characters (runes) Length int // article description, or short excerpt from the content Excerpt string // author metadata Byline string // content direction Dir string // name of the site SiteName string // content language Lang string // published time PublishedTime string }
Click to show internal directories.
Click to hide internal directories.