pkg

package
v0.0.0-...-b7c7836 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 17, 2022 License: MIT Imports: 9 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func FilterSelector

func FilterSelector(selector string) (string, string)

FilterSelector converts characters and returns last element separate from the full selector path

func FilterUrl

func FilterUrl(url string) string

FilterUrl modifies URL address to fit the specified requirements (remove 'http://' etc.)

func TakeUserElementSelectorInput

func TakeUserElementSelectorInput() []string

TakeUserElementSelectorInput takes user input element selectors one by one and adds them to the site (use with invoking -a or -add flag on scrape)

func WriteToJsonFile

func WriteToJsonFile(sites []Site, OutputFile string)

Types

type Article

type Article struct {
	ID    int    `json:"id"`
	Title string `json:"title"`
}

Article serves as a single news article in a news source

type ScrapeData

type ScrapeData struct {
	Sites          []Site `json:"all_known_sites"`
	OutputFileName string `json:"output_file_name"`
	CurrentSite    Site   `json:"current_site"`
}

ScrapeData is a struct of all the needed read/saved/generated site information for scraping and data checking

func (*ScrapeData) CreateSite

func (sd *ScrapeData) CreateSite(x int, url string, selector []string, articles []Article) Site

CreateSite creates a new valid news source

func (*ScrapeData) EnsureOutputFile

func (sd *ScrapeData) EnsureOutputFile() error

EnsureOutputFile checks if the output file exists, otherwise creates it

func (ScrapeData) EnsurePageExists

func (sd ScrapeData) EnsurePageExists(url string)

EnsurePageExists makes sure that the news webpage is added if it doesn't exist

func (ScrapeData) ExportSites

func (sd ScrapeData) ExportSites()

ExportSites writes all article data to the pre-defined JSON output file

func (*ScrapeData) ReadFile

func (sd *ScrapeData) ReadFile() []Site

ReadFile calls EnsureOutputFile to check for output file validity and reads site articles data

func (*ScrapeData) ScrapeAllSites

func (sd *ScrapeData) ScrapeAllSites()

ScrapeAllSites scrapes all sites of data and stores it

type Site

type Site struct {
	ID        int       `json:"id"`
	Url       string    `json:"url"`
	Selectors []string  `json:"selectors"`
	Articles  []Article `json:"articles"`
}

Site serves as a struct of each news source (wepage)

func (*Site) ContainsArticle

func (s *Site) ContainsArticle(title string) bool

ContainsArticle checks if current site already contains the exact same article

func (*Site) Export

func (s *Site) Export(sites []Site, OutputFile string)

Export outputs single site to file

func (*Site) Scrape

func (s *Site) Scrape()

Scrape traverses the current site and scrapes the content of provided elements.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL