loaders

package
v0.1.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 30, 2024 License: MIT Imports: 11 Imported by: 1

Documentation

Index

Constants

View Source
const (
	BODY_EXPR       = "" /* 224-byte string literal not displayed */
	BODY_EXPR_SHORT = ".ArticleBase-Body, .post, .content, article, body"
)
View Source
const (
	YC_HACKERNEWS_SOURCE = "YC HACKER NEWS"
	MEDIUM_SOURCE        = "MEDIUM"
)
View Source
const (
	ARTICLE = "article"
)

Variables

This section is empty.

Functions

This section is empty.

Types

type Document

type Document struct {
	Kind        string   `json:"kind,omitempty"`
	URL         string   `json:"url,omitempty"`
	Source      string   `json:"source,omitempty"`
	Title       string   `json:"title,omitempty"`
	Text        string   `json:"text,omitempty"`
	Author      string   `json:"author,omitempty"`
	PublishDate int64    `json:"created,omitempty"`
	Keywords    []string `json:"keywords,omitempty"`
	Comments    int      `json:"comments,omitempty"`
	Likes       int      `json:"likes,omitempty"`
}

func (*Document) String

func (c *Document) String() string

type WebLoader

type WebLoader struct {
	Config *WebLoaderConfig
	// contains filtered or unexported fields
}

// GENERIC WEB SITE LOADER //// loader class for web links and sites the loaded content is cached

func NewDefaultNewsSitemapLoader

func NewDefaultNewsSitemapLoader(days int, sitemap_url string) *WebLoader

Loads articles from https://feeds.feedburner.com/TheHackersNews that have been posted in the last N days

func NewDefaultWebTextLoader

func NewDefaultWebTextLoader(config *WebLoaderConfig) *WebLoader

sitemap_url can be "" if the collector is not purposed for any specific sitemap scrapping

func NewMediumSiteLoader

func NewMediumSiteLoader(days int) *WebLoader

loades medium posts from https://medium.com/sitemap/sitemap.xml that have been modified in the last N days

func NewRedditLinkLoader

func NewRedditLinkLoader() *WebLoader

func NewYCHackerNewsSiteLoader

func NewYCHackerNewsSiteLoader() *WebLoader

loads story links from https://hacker-news.firebaseio.com/v0/topstories.json posted in the last N days

func (*WebLoader) Get

func (c *WebLoader) Get(url string) *Document

func (*WebLoader) ListAll

func (c *WebLoader) ListAll() []*Document

func (*WebLoader) LoadDocument

func (c *WebLoader) LoadDocument(url string) *Document

this function will return an instance of an extracted WebArticle if the url contains an HTML body

func (*WebLoader) LoadSite

func (c *WebLoader) LoadSite() []*Document

this function will load all the documents from a sitemap or rss feed

type WebLoaderConfig

type WebLoaderConfig struct {
	Sitemap           string
	DisallowedFilters []string
	Timeout           time.Duration
	LocalCache        string
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL