Documentation
¶
Overview ¶
Package webscraper The Webpage Scraper Tool is a utility within the Atomic Agents ecosystem designed for scraping web content and converting it to markdown format. It includes features for extracting metadata and cleaning up the content for better readability.
Index ¶
Constants ¶
View Source
const ( DefaultUserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36" DefaultAccept = "text/html,application/xhtml+xml,application/xml;" )
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Input ¶
type Input struct { // URL of the webpage to scrape. URL string `json:"url,omitempty" jsonschema:"title=url,description=URL of the webpage to scrape." validate:"required,url"` // IncludeLinks Whether to preserve hyperlinks in the markdown output. IncludeLinks bool `` /* 130-byte string literal not displayed */ }
Input schema for the WebpageScraperTool.
type Metadata ¶
type Metadata struct { // Title is the title of the webpage. Title string `json:"url,omitempty" jsonschema:"title=title,description=The title of the webpage."` // Author is the author of the webpage content. Author string `json:"author,omitempty" jsonschema:"title=author,description=The Author of the webpage."` // Description is the meta description of the webpage. Description string `json:"description,omitempty" jsonschema:"title=description,description=The meta description of the webpage."` // Keywords is the meta keywords of the webpage. Keywords string `json:"keywords,omitempty" jsonschema:"title=keywords,description=The meta keywords of the webpage."` // SiteName is the name of the website. SiteName string `json:"sitename,omitempty" jsonschema:"title=sitename,description=The name of the website."` // Domain is the domain name of the website. Domain string `json:"domain,omitempty" jsonschema:"title=domain,description=The domain name of the website."` }
Metadata Schema for webpage metadata
type Option ¶
type Option func(*Config)
func WithHttpClient ¶
func WithMaxContentLength ¶
func WithTimeout ¶
func WithUserAgent ¶
type Output ¶
type Output struct { // Content The scraped content in markdown format. Content string `json:"content,omitempty" jsonschema:"title=content,description=The scraped content in markdown format."` // Metadata is metadata about the scraped webpage. Metadata *Metadata `json:"metadata,omitempty" jsonschema:"title=metadata,description=Metadata about the webpage."` }
Output Schema for the output of the WebpageScraperTool.
type Webscraper ¶
type Webscraper struct {
Config
}
func New ¶ added in v1.0.1
func New(opts ...Option) *Webscraper
func (*Webscraper) RunAnonymous ¶ added in v1.0.8
RunAnonymous run tool for tools ochestration
Click to show internal directories.
Click to hide internal directories.