Documentation ¶
Index ¶
- Constants
- func ExtractJsonListFromFile(srcFile string) ([]map[string]interface{}, error)
- func GetFullLink(ctx context.Context, itemSel string) (hrefUrl string, err error)
- func GetProxyAddress(ctx context.Context, url string) (rtn string, err error)
- func GetProxyAddressArray(ctx context.Context, url string) (rtn []string, err error)
- func GetProxyAddressString(obj ProxyInfo) (result string)
- func GetProxyArrayWithConvertFunc(ctx context.Context, url string, convertFunc ConvertFunc) (rtn []string, err error)
- func GetSocks5Proxy(ctx context.Context, url string) (rtn string, err error)
- func GetSocks5ProxyArray(ctx context.Context, url string) (rtn []string, err error)
- func GetSocks5ProxyUrl(obj ProxyInfo) (result string)
- func GetTbodyDom(content string) (*goquery.Document, error)
- func GetTheadDom(content string) (*goquery.Document, error)
- func NewChromedpContext(ctx context.Context) (context.Context, context.CancelFunc)
- func RecordData(data interface{}) (err error)
- func RunWithCrawler(ctx context.Context, crawler Crawler)
- func SetRecordWriter(writer ioutils.Writer)
- func StartRecoder(ctx context.Context, writer ioutils.Writer)
- func ToCrawl(action CrawlAction)
- type ConvertFunc
- type CrawlAction
- type Crawler
- type ProxyInfo
- type ProxyResponse
- type Root
Constants ¶
View Source
const COMPLETE_OPT_SIZE = 128
View Source
const PARA_OPT_SIZE = 16
Variables ¶
This section is empty.
Functions ¶
func ExtractJsonListFromFile ¶ added in v0.1.7
func GetFullLink ¶ added in v0.1.3
使用chromedp.Run方法获取完整链接,目标项之中必须本身包含href属性。
func GetProxyAddress ¶ added in v0.1.9
获取单个SOCKS5代理地址
func GetProxyAddressArray ¶ added in v0.1.9
获取一组SOCKS5代理地址
func GetProxyAddressString ¶ added in v0.1.9
func GetProxyArrayWithConvertFunc ¶ added in v0.1.9
func GetProxyArrayWithConvertFunc(ctx context.Context, url string, convertFunc ConvertFunc) (rtn []string, err error)
获取一组SOCKS5代理地址
func GetSocks5Proxy ¶
获取单个SOCKS5代理地址
func GetSocks5ProxyArray ¶ added in v0.1.8
获取一组SOCKS5代理地址
func GetSocks5ProxyUrl ¶ added in v0.1.8
func GetTbodyDom ¶ added in v0.1.3
如果是tbody类型的元素,则必须进行此替换,goquery才能正常读取解析,且实际读取时,依然保持tbody标识
func GetTheadDom ¶ added in v0.1.6
如果是thead类型的元素,则必须进行此替换,goquery才能正常读取解析,且实际读取时,需要调整为以tbody进行标识
func NewChromedpContext ¶ added in v0.1.0
func RecordData ¶
func RecordData(data interface{}) (err error)
写入已经json序列化好的字符数组,字符串或者是可以进行json序列化的对象
func RunWithCrawler ¶
func SetRecordWriter ¶ added in v0.2.0
func StartRecoder ¶ added in v0.0.8
开启存储数据的服务,便于实现全局写数据
func ToCrawl ¶
func ToCrawl(action CrawlAction)
Types ¶
type ConvertFunc ¶ added in v0.1.9
type CrawlAction ¶
type CrawlAction = func(proxyReqUrl string, isHeadless bool, customMap map[string]string, outputFile string)
func GetAction ¶
func GetAction(crawler Crawler) CrawlAction
type ProxyResponse ¶
Click to show internal directories.
Click to hide internal directories.