web

package

v0.8.3 Latest Latest Go to latest Published: Oct 8, 2024 License: MIT Imports: 8 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/masa-finance/masa-oracle

Links

Open Source Insights

Documentation ¶

Index ¶

func ScrapeWebData(uri []string, depth int) ([]byte, error)
type CollectedData
type Section

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func ScrapeWebData ¶

func ScrapeWebData(uri []string, depth int) ([]byte, error)

ScrapeWebData initiates the scraping process for the given list of URIs. It returns a CollectedData struct containing the scraped sections from each URI, and an error if any occurred during the scraping process.

Parameters:

uri: []string - list of URLs to scrape
depth: int - depth of how many subpages to scrape

Returns:

[]byte - JSON representation of the collected data
error - any error that occurred during the scraping process

Example usage:

go func() {
	res, err := scraper.ScrapeWebData([]string{"https://en.wikipedia.org/wiki/Maize"}, 5)
	if err != nil {
		logrus.WithError(err).Error("Error collecting data")
		return
	}
	logrus.WithField("result", string(res)).Info("Scraping completed")
}()

Types ¶

type CollectedData ¶

type CollectedData struct {
	Sections []Section `json:"sections"` // Sections is a collection of webpage sections that have been scraped.
	Pages    []string  `json:"pages"`
}

CollectedData represents the aggregated result of the scraping process. It contains a slice of Section structs, each representing a distinct part of a scraped webpage.

type Section ¶

type Section struct {
	Title      string   `json:"title"`      // Title is the heading text of the section.
	Paragraphs []string `json:"paragraphs"` // Paragraphs contains all the text content of the section.
	Images     []string `json:"images"`     // Images storing base64 - maybe!!?
}

Section represents a distinct part of a scraped webpage, typically defined by a heading. It contains a Title, representing the heading of the section, and Paragraphs, a slice of strings containing the text content found within that section.

Source Files ¶

View all Source files

web.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL