Documentation ¶
Overview ¶
webextractor are default interfaces for Colibri ready to start crawling or extracting data on the web.
Index ¶
- func New(cookieJar ...http.CookieJar) (*colibri.Colibri, error)
- type Client
- type ReqDelay
- type Response
- func (resp *Response) Body() io.ReadCloser
- func (resp *Response) Do(rules *colibri.Rules) (colibri.Response, error)
- func (resp *Response) Extract(rules *colibri.Rules) (*colibri.Output, error)
- func (resp *Response) Header() http.Header
- func (resp *Response) Redirects() []*url.URL
- func (resp *Response) Serializable() map[string]any
- func (resp *Response) StatusCode() int
- func (resp *Response) URL() *url.URL
- type RobotsData
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
Types ¶
type Client ¶
type Client struct { // Jar specifies the cookie jar. Jar http.CookieJar // contains filtered or unexported fields }
Client represents an HTTP client. See the colibri.HTTPClient interface.
type ReqDelay ¶
type ReqDelay struct {
// contains filtered or unexported fields
}
ReqDelay manages the delay between each HTTP request. See the colibri.Delay interface.
type Response ¶
Response represents an HTTP response. See the colibri.Response interface.
func (*Response) Body ¶
func (resp *Response) Body() io.ReadCloser
func (*Response) Serializable ¶
func (*Response) StatusCode ¶
type RobotsData ¶
type RobotsData struct {
// contains filtered or unexported fields
}
RobotsData gets, stores and parses robots.txt restrictions.
func NewRobotsData ¶
func NewRobotsData() *RobotsData
NewRobotsData returns a new RobotsData structure.
func (*RobotsData) Clear ¶
func (robots *RobotsData) Clear()
Clear removes stored robots.txt restrictions.
Click to show internal directories.
Click to hide internal directories.