Documentation ¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func ScrapeWebData ¶
ScrapeWebData initiates the scraping process for the given list of URIs. It returns a CollectedData struct containing the scraped sections from each URI, and an error if any occurred during the scraping process. Usage: @param uri: string - url to scrape @param depth: int - depth of how many subpages to scrape Example:
go func() { res, err := scraper.ScrapeWebDataForSentiment([]string{"https://en.wikipedia.org/wiki/Maize"}, 5) if err != nil { logrus.Errorf("Error collecting data: %s", err.Error()) return } logrus.Infof("%+v", res) }()
func ScrapeWebDataForSentiment ¶
ScrapeWebDataForSentiment initiates the scraping process for the given list of URIs. It returns a CollectedData struct containing the scraped sections from each URI, and an error if any occurred during the scraping process. Usage: @param uri: string - url to scrape @param depth: int - depth of how many subpages to scrape @param model: string - model to use for sentiment analysis Example:
go func() { res, err := scraper.ScrapeWebDataForSentiment([]string{"https://en.wikipedia.org/wiki/Maize"}, 5) if err != nil { logrus.Errorf("Error collecting data: %s", err.Error()) return } logrus.Infof("%+v", res) }()
Types ¶
type CollectedData ¶
type CollectedData struct {
Sections []Section // Sections is a collection of webpage sections that have been scraped.
}
CollectedData represents the aggregated result of the scraping process. It contains a slice of Section structs, each representing a distinct part of a scraped webpage.
type Section ¶
type Section struct { Title string // Title is the heading text of the section. Paragraphs []string // Paragraphs contains all the text content of the section. Images []string // Images storing base64 - maybe!!? }
Section represents a distinct part of a scraped webpage, typically defined by a heading. It contains a Title, representing the heading of the section, and Paragraphs, a slice of strings containing the text content found within that section.