Go-Readability
Go-Readability is a Go package that cleans a HTML page from clutter like buttons, ads and background images, and changes the page's text size, contrast and layout for better readability.
This package is fork from readability by ying32, which inspired by readability for node.js and readability for python. I also add some function from the readibility by Mozilla.
Why fork ?
There are severals reasons as to why I create a new fork instead sending a PR to original repository :
- It seems GitHub is hard to access from China, that's why ying32 is not really active on his repository.
- Most of comment and documentation in original repository is in Chinese language, which unfortunately I still not able to understand.
Example
package main
import (
"fmt"
nurl "net/url"
"time"
"github.com/RadhiFadlillah/go-readability"
)
func main() {
// Create URL
url := "https://www.nytimes.com/2018/01/21/technology/inside-amazon-go-a-store-of-the-future.html"
parsedURL, _ := nurl.Parse(url)
// Fetch readable content
article, err := readability.FromURL(parsedURL, 5*time.Second)
if err != nil {
panic(err)
}
// Show results
fmt.Println(article.Meta.Title)
fmt.Println(article.Meta.Excerpt)
fmt.Println(article.Meta.Author)
fmt.Println(article.Content)
}