scraper

package
v0.0.1-beta Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 29, 2024 License: MIT Imports: 3 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ScrapeWebData

func ScrapeWebData(uri []string, depth int) (string, error)

ScrapeWebData initiates the scraping process for the given list of URIs. It returns a CollectedData struct containing the scraped sections from each URI, and an error if any occurred during the scraping process. Usage: @param uri: string - url to scrape @param depth: int - depth of how many subpages to scrape Example:

go func() {
	res, err := scraper.ScrapeWebDataForSentiment([]string{"https://en.wikipedia.org/wiki/Maize"}, 5)
	if err != nil {
		logrus.Errorf("Error collecting data: %s", err.Error())
	return
  }
logrus.Infof("%+v", res)
}()

func ScrapeWebDataForSentiment

func ScrapeWebDataForSentiment(uri []string, depth int, model string) (string, string, error)

ScrapeWebDataForSentiment initiates the scraping process for the given list of URIs. It returns a CollectedData struct containing the scraped sections from each URI, and an error if any occurred during the scraping process. Usage: @param uri: string - url to scrape @param depth: int - depth of how many subpages to scrape @param model: string - model to use for sentiment analysis Example:

go func() {
	res, err := scraper.ScrapeWebDataForSentiment([]string{"https://en.wikipedia.org/wiki/Maize"}, 5)
	if err != nil {
		logrus.Errorf("Error collecting data: %s", err.Error())
	return
  }
logrus.Infof("%+v", res)
}()

Types

type CollectedData

type CollectedData struct {
	Sections []Section // Sections is a collection of webpage sections that have been scraped.
}

CollectedData represents the aggregated result of the scraping process. It contains a slice of Section structs, each representing a distinct part of a scraped webpage.

type Section

type Section struct {
	Title      string   // Title is the heading text of the section.
	Paragraphs []string // Paragraphs contains all the text content of the section.
	Images     []string // Images storing base64 - maybe!!?
}

Section represents a distinct part of a scraped webpage, typically defined by a heading. It contains a Title, representing the heading of the section, and Paragraphs, a slice of strings containing the text content found within that section.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL