Documentation ¶
Index ¶
- Variables
- type Extractor
- func (e *Extractor) Extract(url string, urlContent *string) (*Extractor, error)
- func (e *Extractor) GetExtracted() map[Syntax]any
- func (e *Extractor) GetExtractedJSON() json.RawMessage
- func (e *Extractor) SetFetchTimeout(fetchTimeout uint8) *Extractor
- func (e *Extractor) SetSyntaxes(syntaxes []Syntax) *Extractor
- func (e *Extractor) SetUserAgent(userAgent string) *Extractor
- type Processor
- type Syntax
Constants ¶
This section is empty.
Variables ¶
var SYNTAXES = []Syntax{SyntaxOpenGraph, SyntaxXCards, SyntaxJSONLD, SyntaxMicrodata}
SYNTAXES defines an array of metadata syntax identifiers supported for parsing.
Functions ¶
This section is empty.
Types ¶
type Extractor ¶
type Extractor struct {
// contains filtered or unexported fields
}
Extractor is a struct used for extracting metadata from web content or a provided URL. It utilizes various processors.
func New ¶
func New() *Extractor
New creates a new instance of Extractor with default configurations and an empty map for extracted data.
func (*Extractor) Extract ¶
Extract retrieves metadata from the specified URL or provided content and processes it using various parsers. url: The URL to extract metadata from. urlContent: Optional pointer to a string containing HTML content. If nil, the content at the URL will be fetched.
func (*Extractor) GetExtracted ¶
GetExtracted returns the extracted metadata as a map by processor name from the Extractor instance.
func (*Extractor) GetExtractedJSON ¶
func (e *Extractor) GetExtractedJSON() json.RawMessage
GetExtractedJSON returns the extracted metadata as a JSON-formatted byte array with indentation.
func (*Extractor) SetFetchTimeout ¶
SetFetchTimeout sets the HTTP client's fetch timeout value in seconds. fetchTimeout: A uint8 value representing the timeout duration in seconds. Returns the updated Extractor instance.
func (*Extractor) SetSyntaxes ¶
SetSyntaxes sets the syntaxes that the Extractor will use for parsing metadata. Filters out unsupported syntaxes. syntaxes: A slice of Syntax representing the desired syntaxes. Returns the updated Extractor instance.
func (*Extractor) SetUserAgent ¶
SetUserAgent sets the User-Agent header for the HTTP client used by the Extractor. userAgent: A string representing the User-Agent to set for HTTP requests. Returns the updated Extractor instance.
type Processor ¶
Processor represents a data structure to hold a processor's name and function for extracting metadata.
type Syntax ¶
type Syntax string
const ( // SyntaxOpenGraph is the identifier used for the Open Graph metadata syntax. SyntaxOpenGraph Syntax = "opengraph" // SyntaxXCards is the identifier used for the X Cards metadata syntax. SyntaxXCards Syntax = "xcards" // SyntaxJSONLD is the identifier used for the JSON-LD metadata syntax. SyntaxJSONLD Syntax = "json-ld" // SyntaxMicrodata is the identifier used for the W3C Microdata metadata syntax. SyntaxMicrodata Syntax = "microdata" )