Documentation ¶
Index ¶
- Variables
- func DoRequest(url string, timeout int, headers map[string]string) ([]byte, error)
- func Get(url string, timeout int, maxRetries int) ([]byte, error)
- func GetFileExtenstion(file *[]byte) (string, error)
- func SaveFile(data []byte, path string) error
- func SaveFiles(results <-chan []*CdxResponse, outputDir string, errors chan error, ...)
- type CdxResponse
- type RequestConfig
- type Source
Constants ¶
This section is empty.
Variables ¶
View Source
var Status500Error = errors.New("Server returned 500 status response. (Slow down)")
View Source
var Status503Error = errors.New("Server returned 503 status response")
Functions ¶
func GetFileExtenstion ¶
Types ¶
type CdxResponse ¶
type CdxResponse struct { Urlkey string `json:"urlkey,omitempty"` Timestamp string `json:"timestamp,omitempty"` Charset string `json:"charset,omitempty"` MimeType string `json:"mime,omitempty"` Languages string `json:"languages,omitempty"` MimeDetected string `json:"mimedetected,omitempty"` Digest string `json:"digest,omitempty"` Offset string `json:"offset,omitempty"` Original string `json:"url,omitempty"` // Original URL Length string `json:"length,omitempty"` StatusCode string `json:"status,omitempty"` Filename string `json:"filename,omitempty"` Source Source }
WebArchive and Common Crawl (index.commoncrawl.org) CDX API Response structure from
type RequestConfig ¶
type RequestConfig struct { URL string // Url to parse Filters []string // Extenstion to search Limit uint // Max number of results per page CollapseColumn string // Which column to use to collapse results SinglePage bool // Get results only from 1st page (mostly used for tests) FromDate string // Filter results from Date ToDate string // Filter results to Date }
type Source ¶
type Source interface { Name() string ParseResponse(resp []byte) ([]*CdxResponse, error) GetNumPages(url string) (int, error) GetPages(config RequestConfig) ([]*CdxResponse, error) FetchPages(config RequestConfig, results chan []*CdxResponse, errors chan error) GetFile(*CdxResponse) ([]byte, error) }
Source of web archive data
Click to show internal directories.
Click to hide internal directories.