webscraper

package

v1.1.5 Latest Latest Go to latest Published: Feb 17, 2025 License: MIT Imports: 12 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

Documentation ¶

Overview ¶

Package webscraper The Webpage Scraper Tool is a utility within the Atomic Agents ecosystem designed for scraping web content and converting it to markdown format. It includes features for extracting metadata and cleaning up the content for better readability.

Index ¶

Constants
type Config
type Input
- func NewInput(link string, includeLinks bool) *Input
type Metadata
type Option
type Output
- func NewOutput(content string, metadata *Metadata) *Output
type Webscraper
- func New(opts ...Option) *Webscraper
- func (t *Webscraper) Run(ctx context.Context, input *Input, output *Output) error
- func (t *Webscraper) RunAnonymous(ctx context.Context, input any) (any, error)

Constants ¶

View Source

const (
	DefaultUserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
	DefaultAccept    = "text/html,application/xhtml+xml,application/xml;"
)

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Config ¶

type Config struct {
	tools.Config
	// contains filtered or unexported fields
}

type Input ¶

type Input struct {
	// URL of the webpage to scrape.
	URL string `json:"url,omitempty" jsonschema:"title=url,description=URL of the webpage to scrape." validate:"required,url"`
	// IncludeLinks Whether to preserve hyperlinks in the markdown output.
	IncludeLinks bool `` /* 130-byte string literal not displayed */
}

Input schema for the WebpageScraperTool.

func NewInput ¶

func NewInput(link string, includeLinks bool) *Input

type Metadata ¶

type Metadata struct {
	// Title is the title of the webpage.
	Title string `json:"url,omitempty" jsonschema:"title=title,description=The title of the webpage."`
	// Author is the author of the webpage content.
	Author string `json:"author,omitempty" jsonschema:"title=author,description=The Author of the webpage."`
	// Description is the meta description of the webpage.
	Description string `json:"description,omitempty" jsonschema:"title=description,description=The meta description of the webpage."`
	// Keywords is the meta keywords of the webpage.
	Keywords string `json:"keywords,omitempty" jsonschema:"title=keywords,description=The meta keywords of the webpage."`
	// SiteName is the name of the website.
	SiteName string `json:"sitename,omitempty" jsonschema:"title=sitename,description=The name of the website."`
	// Domain is the domain name of the website.
	Domain string `json:"domain,omitempty" jsonschema:"title=domain,description=The domain name of the website."`
}

Metadata Schema for webpage metadata

type Option ¶

type Option func(*Config)

func WithHttpClient ¶

func WithHttpClient(clt *http.Client) Option

func WithMaxContentLength ¶

func WithMaxContentLength(l int64) Option

func WithTimeout ¶

func WithTimeout(timeout int) Option

func WithUserAgent ¶

func WithUserAgent(ua string) Option

type Output ¶

type Output struct {
	// Content The scraped content in markdown format.
	Content string `json:"content,omitempty" jsonschema:"title=content,description=The scraped content in markdown format."`
	// Metadata is metadata about the scraped webpage.
	Metadata *Metadata `json:"metadata,omitempty" jsonschema:"title=metadata,description=Metadata about the webpage."`
}

Output Schema for the output of the WebpageScraperTool.

func NewOutput ¶

func NewOutput(content string, metadata *Metadata) *Output

type Webscraper ¶

type Webscraper struct {
	Config
}

func New ¶ added in v1.0.1

func New(opts ...Option) *Webscraper

func (*Webscraper) Run ¶

func (t *Webscraper) Run(ctx context.Context, input *Input, output *Output) error

func (*Webscraper) RunAnonymous ¶ added in v1.0.8

func (t *Webscraper) RunAnonymous(ctx context.Context, input any) (any, error)

RunAnonymous run tool for tools ochestration

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL