Documentation ¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func ScrapeWebData ¶
ScrapeWebData initiates the scraping process for the given list of URIs. It returns a CollectedData struct containing the scraped sections from each URI, and an error if any occurred during the scraping process.
Parameters:
- uri: []string - list of URLs to scrape
- depth: int - depth of how many subpages to scrape
Returns:
- []byte - JSON representation of the collected data
- error - any error that occurred during the scraping process
Example usage:
go func() { res, err := scraper.ScrapeWebData([]string{"https://en.wikipedia.org/wiki/Maize"}, 5) if err != nil { logrus.WithError(err).Error("Error collecting data") return } logrus.WithField("result", string(res)).Info("Scraping completed") }()
func ScrapeWebDataForSentiment ¶
ScrapeWebDataForSentiment initiates the scraping process for the given list of URIs. It returns a CollectedData struct containing the scraped sections from each URI, and an error if any occurred during the scraping process.
Parameters:
- uris: []string - list of URLs to scrape
- depth: int - depth of how many subpages to scrape
- model: string - model to use for sentiment analysis
Returns:
- string: Scraped data
- string: Sentiment analysis result
- error: Any error that occurred during the process
Example:
go func() { data, sentiment, err := ScrapeWebDataForSentiment([]string{"https://en.wikipedia.org/wiki/Maize"}, 5, "gpt-3.5-turbo") if err != nil { logrus.WithError(err).Error("Failed to collect data") return } logrus.WithFields(logrus.Fields{ "data": data, "sentiment": sentiment, }).Info("Scraping and sentiment analysis completed") }()
Types ¶
type CollectedData ¶
type CollectedData struct { Sections []Section `json:"sections"` // Sections is a collection of webpage sections that have been scraped. Pages []string `json:"pages"` }
CollectedData represents the aggregated result of the scraping process. It contains a slice of Section structs, each representing a distinct part of a scraped webpage.
type Section ¶
type Section struct { Title string `json:"title"` // Title is the heading text of the section. Paragraphs []string `json:"paragraphs"` // Paragraphs contains all the text content of the section. Images []string `json:"images"` // Images storing base64 - maybe!!? }
Section represents a distinct part of a scraped webpage, typically defined by a heading. It contains a Title, representing the heading of the section, and Paragraphs, a slice of strings containing the text content found within that section.