Documentation ¶
Overview ¶
Package page contains result catched by Downloader. And it alse has result parsed by PageProcesser.
Index ¶
- type Page
- func (this *Page) AddField(key string, value string)
- func (this *Page) AddTargetRequest(url string, respType string) *Page
- func (this *Page) AddTargetRequestWithHeaderFile(url string, respType string, headerFile string) *Page
- func (this *Page) AddTargetRequestWithParams(req *request.Request) *Page
- func (this *Page) AddTargetRequestWithProxy(url string, respType string, proxyHost string) *Page
- func (this *Page) AddTargetRequests(urls []string, respType string) *Page
- func (this *Page) AddTargetRequestsWithParams(reqs []*request.Request) *Page
- func (this *Page) AddTargetRequestsWithProxy(urls []string, respType string, proxyHost string) *Page
- func (this *Page) Errormsg() string
- func (this *Page) GetBodyStr() string
- func (this *Page) GetCookies() []*http.Cookie
- func (this *Page) GetHeader() http.Header
- func (this *Page) GetHtmlParser() *goquery.Document
- func (this *Page) GetJson() *simplejson.Json
- func (this *Page) GetPageItems() *page_items.PageItems
- func (this *Page) GetRequest() *request.Request
- func (this *Page) GetSkip() bool
- func (this *Page) GetTargetRequests() []*request.Request
- func (this *Page) GetUrlTag() string
- func (this *Page) IsSucc() bool
- func (this *Page) ResetHtmlParser() *goquery.Document
- func (this *Page) SetBodyStr(body string) *Page
- func (this *Page) SetCookies(cookies []*http.Cookie)
- func (this *Page) SetHeader(header http.Header)
- func (this *Page) SetHtmlParser(doc *goquery.Document) *Page
- func (this *Page) SetJson(js *simplejson.Json) *Page
- func (this *Page) SetRequest(r *request.Request) *Page
- func (this *Page) SetSkip(skip bool)
- func (this *Page) SetStatus(isfail bool, errormsg string)
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Page ¶
type Page struct {
// contains filtered or unexported fields
}
Page represents an entity be crawled.
func (*Page) AddTargetRequest ¶
AddTargetRequest adds one new Request waitting for crawl.
func (*Page) AddTargetRequestWithHeaderFile ¶
func (this *Page) AddTargetRequestWithHeaderFile(url string, respType string, headerFile string) *Page
AddTargetRequest adds one new Request with header file for waitting for crawl.
func (*Page) AddTargetRequestWithParams ¶
AddTargetRequest adds one new Request waitting for crawl. The respType is "html" or "json" or "jsonp" or "text". The urltag is name for marking url and distinguish different urls in PageProcesser and Pipeline. The method is POST or GET. The postdata is http body string. The header is http header. The cookies is http cookies.
func (*Page) AddTargetRequestWithProxy ¶
AddTargetRequestWithProxy adds one new Request waitting for crawl.
func (*Page) AddTargetRequests ¶
AddTargetRequests adds new Requests waitting for crawl.
func (*Page) AddTargetRequestsWithParams ¶
AddTargetRequests adds new Requests waitting for crawl.
func (*Page) AddTargetRequestsWithProxy ¶
func (this *Page) AddTargetRequestsWithProxy(urls []string, respType string, proxyHost string) *Page
AddTargetRequestsWithProxy adds new Requests waitting for crawl.
func (*Page) GetBodyStr ¶
GetBodyStr returns plain string crawled.
func (*Page) GetCookies ¶
GetHeader returns the cookies of http responce
func (*Page) GetHtmlParser ¶
GetHtmlParser returns goquery object binded to target crawl result.
func (*Page) GetPageItems ¶
func (this *Page) GetPageItems() *page_items.PageItems
GetPageItems returns PageItems object that record KV pair parsed in PageProcesser.
func (*Page) GetRequest ¶
GetRequest returns request oject of this page.
func (*Page) GetTargetRequests ¶
GetTargetRequests returns the target requests that will put into Scheduler
func (*Page) ResetHtmlParser ¶
GetHtmlParser returns goquery object binded to target crawl result.
func (*Page) SetBodyStr ¶
SetBodyStr saves plain string crawled in Page.
func (*Page) SetCookies ¶
SetHeader save the cookies of http responce
func (*Page) SetHtmlParser ¶
SetHtmlParser saves goquery object binded to target crawl result.
func (*Page) SetRequest ¶
SetRequest saves request oject of this page.