Documentation ¶
Index ¶
- func FilterSelector(selector string) (string, string)
- func FilterUrl(url string) string
- func TakeUserElementSelectorInput() []string
- func WriteToJsonFile(sites []Site, OutputFile string)
- type Article
- type ScrapeData
- func (sd *ScrapeData) CreateSite(x int, url string, selector []string, articles []Article) Site
- func (sd *ScrapeData) EnsureOutputFile() error
- func (sd ScrapeData) EnsurePageExists(url string)
- func (sd ScrapeData) ExportSites()
- func (sd *ScrapeData) ReadFile() []Site
- func (sd *ScrapeData) ScrapeAllSites()
- type Site
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func FilterSelector ¶
FilterSelector converts characters and returns last element separate from the full selector path
func FilterUrl ¶
FilterUrl modifies URL address to fit the specified requirements (remove 'http://' etc.)
func TakeUserElementSelectorInput ¶
func TakeUserElementSelectorInput() []string
TakeUserElementSelectorInput takes user input element selectors one by one and adds them to the site (use with invoking -a or -add flag on scrape)
func WriteToJsonFile ¶
Types ¶
type ScrapeData ¶
type ScrapeData struct { Sites []Site `json:"all_known_sites"` OutputFileName string `json:"output_file_name"` CurrentSite Site `json:"current_site"` }
ScrapeData is a struct of all the needed read/saved/generated site information for scraping and data checking
func (*ScrapeData) CreateSite ¶
CreateSite creates a new valid news source
func (*ScrapeData) EnsureOutputFile ¶
func (sd *ScrapeData) EnsureOutputFile() error
EnsureOutputFile checks if the output file exists, otherwise creates it
func (ScrapeData) EnsurePageExists ¶
func (sd ScrapeData) EnsurePageExists(url string)
EnsurePageExists makes sure that the news webpage is added if it doesn't exist
func (ScrapeData) ExportSites ¶
func (sd ScrapeData) ExportSites()
ExportSites writes all article data to the pre-defined JSON output file
func (*ScrapeData) ReadFile ¶
func (sd *ScrapeData) ReadFile() []Site
ReadFile calls EnsureOutputFile to check for output file validity and reads site articles data
func (*ScrapeData) ScrapeAllSites ¶
func (sd *ScrapeData) ScrapeAllSites()
ScrapeAllSites scrapes all sites of data and stores it
type Site ¶
type Site struct { ID int `json:"id"` Url string `json:"url"` Selectors []string `json:"selectors"` Articles []Article `json:"articles"` }
Site serves as a struct of each news source (wepage)
func (*Site) ContainsArticle ¶
ContainsArticle checks if current site already contains the exact same article