Documentation ¶
Overview ¶
surfer是一款Go语言编写的高并发web下载器,支持 GET/POST/HEAD 方法及 http/https 协议,同时支持固定UserAgent自动保存cookie与随机大量UserAgent禁用cookie两种模式,高度模拟浏览器行为,可实现模拟登录等功能。
Index ¶
- Constants
- func AutoToUTF8(resp *http.Response) error
- func BodyBytes(resp *http.Response) ([]byte, error)
- func DestroyJsFiles()
- func Download(req Request) (resp *http.Response, err error)
- func GetWDPath() string
- func IsDirExists(path string) bool
- func IsFileExists(path string) bool
- func UrlEncode(urlStr string) (*url.URL, error)
- func WalkDir(targpath string, suffixes ...string) (dirlist []string)
- type Body
- type Cookie
- type DefaultRequest
- func (self *DefaultRequest) GetConnTimeout() time.Duration
- func (self *DefaultRequest) GetDialTimeout() time.Duration
- func (self *DefaultRequest) GetDownloaderID() int
- func (self *DefaultRequest) GetEnableCookie() bool
- func (self *DefaultRequest) GetHeader() http.Header
- func (self *DefaultRequest) GetMethod() string
- func (self *DefaultRequest) GetPostData() string
- func (self *DefaultRequest) GetProxy() string
- func (self *DefaultRequest) GetRedirectTimes() int
- func (self *DefaultRequest) GetRetryPause() time.Duration
- func (self *DefaultRequest) GetTryTimes() int
- func (self *DefaultRequest) GetUrl() string
- type DnsCache
- type Param
- type Phantom
- type Request
- type Response
- type Surf
- type Surfer
Constants ¶
Variables ¶
This section is empty.
Functions ¶
func AutoToUTF8 ¶
采用surf内核下载时,可以尝试自动转码为utf8 采用phantomjs内核时,无需转码(已是utf8)
func IsDirExists ¶
The IsDirExists judges path is directory or not.
func IsFileExists ¶
The IsFileExists judges path is file or not.
Types ¶
type Cookie ¶
type Cookie struct { Name string `json:"name"` Value string `json:"value"` Domain string `json:"domain"` Path string `json:"path"` }
给phantomjs传输cookie用
type DefaultRequest ¶
type DefaultRequest struct { // url (必须填写) Url string // GET POST POST-M HEAD (默认为GET) Method string // http header Header http.Header // 是否使用cookies,在Spider的EnableCookie设置 EnableCookie bool // POST values PostData string // dial tcp: i/o timeout DialTimeout time.Duration // WSARecv tcp: i/o timeout ConnTimeout time.Duration // the max times of download TryTimes int // how long pause when retry RetryPause time.Duration // max redirect times // when RedirectTimes equal 0, redirect times is ∞ // when RedirectTimes less than 0, redirect times is 0 RedirectTimes int // the download ProxyHost Proxy string // 指定下载器ID // 0为Surf高并发下载器,各种控制功能齐全 // 1为PhantomJS下载器,特点破防力强,速度慢,低并发 DownloaderID int // contains filtered or unexported fields }
默认实现的Request
func (*DefaultRequest) GetConnTimeout ¶
func (self *DefaultRequest) GetConnTimeout() time.Duration
WSARecv tcp: i/o timeout
func (*DefaultRequest) GetDialTimeout ¶
func (self *DefaultRequest) GetDialTimeout() time.Duration
dial tcp: i/o timeout
func (*DefaultRequest) GetDownloaderID ¶
func (self *DefaultRequest) GetDownloaderID() int
select Surf ro PhomtomJS
func (*DefaultRequest) GetEnableCookie ¶
func (self *DefaultRequest) GetEnableCookie() bool
enable http cookies
func (*DefaultRequest) GetMethod ¶
func (self *DefaultRequest) GetMethod() string
GET POST POST-M HEAD
func (*DefaultRequest) GetProxy ¶
func (self *DefaultRequest) GetProxy() string
the download ProxyHost
func (*DefaultRequest) GetRedirectTimes ¶
func (self *DefaultRequest) GetRedirectTimes() int
max redirect times
func (*DefaultRequest) GetRetryPause ¶
func (self *DefaultRequest) GetRetryPause() time.Duration
the pause time of retry
func (*DefaultRequest) GetTryTimes ¶
func (self *DefaultRequest) GetTryTimes() int
the max times of download
type DnsCache ¶
type DnsCache struct {
// contains filtered or unexported fields
}
DnsCache DNS cache
type Phantom ¶
type Phantom struct { PhantomjsFile string //Phantomjs完整文件名 TempJsDir string //临时js存放目录 CookieJar *cookiejar.Jar // contains filtered or unexported fields }
Phantom 基于Phantomjs的下载器实现,作为surfer的补充 效率较surfer会慢很多,但是因为模拟浏览器,破防性更好 支持UserAgent/TryTimes/RetryPause/自定义js
type Request ¶
type Request interface { // url GetUrl() string // GET POST POST-M HEAD GetMethod() string // POST values GetPostData() string // http header GetHeader() http.Header // enable http cookies GetEnableCookie() bool // dial tcp: i/o timeout GetDialTimeout() time.Duration // WSARecv tcp: i/o timeout GetConnTimeout() time.Duration // the max times of download GetTryTimes() int // the pause time of retry GetRetryPause() time.Duration // the download ProxyHost GetProxy() string // max redirect times GetRedirectTimes() int // select Surf ro PhomtomJS GetDownloaderID() int }
type Response ¶
type Response struct { Cookies []string Body string Error string Header []struct { Name string Value string } }
Response 用于解析Phantomjs的响应内容
type Surfer ¶
type Surfer interface { // GET @param url string, header http.Header, cookies []*http.Cookie // HEAD @param url string, header http.Header, cookies []*http.Cookie // POST PostForm @param url, referer string, values url.Values, header http.Header, cookies []*http.Cookie // POST-M PostMultipart @param url, referer string, values url.Values, header http.Header, cookies []*http.Cookie Download(Request) (resp *http.Response, err error) }
Downloader represents an core of HTTP web browser for crawler.