GO-2023-2011: Yaklang Plugin's Fuzztag Component Allows Unauthorized Local File Reading in github.com/yaklang/yaklang

crawlerx

package

v1.2.2-sp4 Latest Latest Go to latest Published: Jul 13, 2023 License: AGPL-3.0 Imports: 3 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/yaklang/yaklang

README ¶

CrawlerX 爬虫模块使用说明

chan, err = crawlerx.StartCrawler(url string, opts ...configopt) 创建爬虫模块返回结果输出通道和错误信息

err = crawlerx.StartCrawlerV2(url string, opts ...configopt) 新爬虫模块不会返回爬虫结果通过代理通道返回

configopt为爬虫的可选择参数包括以下种类

configopt = crawlerx.proxy(url string) 或 cw.SetProxy(url string, username string, password string) 设置代理信息

configopt = crawlerx.maxUrl(int) 设置最大url爬虫数量

configopt = crawlerx.whiteList(string) 设置白名单关键词支持正则

configopt = crawlerx.blackList(string) 设置黑名单关键词支持正则

configopt = crawlerx.timeout(int) 设置单页面超时时间默认30s

configopt = crawlerx.maxDepth(int) 设置最大爬虫深度默认为3层

configopt = crawlerx.formFill(key string, value string) 设置自定义输入框输入当遇见关键词key包含时默认输入对应value

configopt = crawlerx.header(key string, value string) 设置请求头信息单条形式输入

configopt = crawlerx.headers(map[string]string) 设置请求头信息字典形式输入

configopt = crawlerx.concurrent(int) 设置最大并行页面默认20

configopt = crawlerx.cookie(domain string, key string, value string) 设置cookie 单条输入

configopt = crawlerx.cookies(domain string, value map[string]string) 设置cookie 字典输入

configopt = crawlerx.checkDanger() 设置危险url不进行点击，检测到url中存在某些关键词时自动跳过

configopt = crawlerx.tags(tagpath string) 设置标签文件路径，当设置标签文件路径后可以从url包信息中获取标签信息

configopt = crawlerx.fullTimeout(timeout int) 设置全局最大超时时间，0表示无最大超时时间，默认360秒

configopt = crawlerx.chromeWS(wsAddress string) 设置远程连接chrome地址

configopt = crawlerx.remote(bool) 设置是否远程获取爬虫结果

configopt = crawlerx.extraHeaders(headers ...string) 设置额外的headers，该headers会在页面生成时设置，对该页面产生的所有请求加入该header

crawlerx.extraHeaders("anoTestHeaders", "anotherExtraHeaders")

configopt = crawlerx.scanRange(int) 设置爬取范围其中

crawlerX.AllDomainScan表示爬取全域名

crawlerX.SubMenuScan表示爬取目标URL和子目录

默认为crawlerX.AllDomainScan

configopt = crawlerx.scanRepeat(int) 设置重复url判定等级其中

以URL：http://www.abc.com/test.php?login=admin 为例

page = http://www.abc.com/test.php

method = GET

query-name = login

query-value = admin

crawlerX.HighRepeatLevel 对page敏感

crawlerX.MediumRepeatLevel 对page和method敏感

crawlerX.LowRepeatLevel 对page method和query-name敏感

crawlerX.UnLimitRepeat 对page method query-name和query-value敏感

默认为crawlerX.UnLimitRepeat

在从通道获取爬虫结果时

通道中返回的值包括以下方法

Url() string URL

Method() string 请求方法

RequestHeaders() map[string]string 请求头

RequestBody() string 请求body

ResponseHeaders() map[string][]string 响应头

ResponseBody() string 响应body

Tag() []string 标签列表

使用范例：

blackConfig = crawlerx.blackList("cart")
maxUrlConfig = crawlerx.maxUrl(30)
ch, err = crawlerx.StartCrawler("http://testphp.vulnweb.com/",blackConfig,maxUrlConfig)
for item = range ch{
    println(item.Url())
}

标签使用范例

testConfig = crawlerx.tags("/Users/chenyangbao/Project/yak/common/crawlerx/tag/rules/rule.yml")
ch, err = crawlerx.StartCrawler("http://testphp.vulnweb.com/", testConfig)
for item = range ch{
    println(item.Url(), item.Tag())
}

V2

depth = crawlerx.maxDepth(5)
proxy = crawlerx.proxy("http://127.0.0.1:8083")
ws = crawlerx.chromeWS("http://192.168.0.115:7317")
remoteUrl = crawlerx.remote(true)
err = crawlerx.StartCrawlerV2("http://testphp.vulnweb.com/",proxy,remoteUrl,ws,depth)

自定义标签说明

在使用为url添加标签的过程中，需要手动制定标签文件的位置

自定义标签文件时，标签文件采取YAML格式编写

一条基本的标签YAML内容如下：

- NAME: file_download_pre_test
  RULES:
    - ORIGIN: response.url_param
      RULE_TYPE: re
      RULE: (path|file|url|Data|src|temp)=
    - ORIGIN: response.url_param
      RULE_TYPE: SCRIPT
      RULE: ORIGIN.lastIndexOf(".")>-1

NAME: 标签名称

RULES：规则内容规则内容包括：

ORIGIN：数据来源包括

response.url 响应url
response.html 响应页面html内容
response.responseHeader 响应头内容 map[string]string 格式
response.url_param url参数
response.path url路径

RULE_TYPE: 规则类型（不分大小写）规则类型包括：

re 正则匹配此时RULE内容为正则匹配的表达式
json 字典匹配此时规则中还会出现KEY关键词，用于匹配字典的Key，此时RULE内容为字典对应key需要比对的对应value，例如：

- NAME: file_download_pre_test
  RULES:
    - ORIGIN: response.responseHeader
      RULE_TYPE: JSON
      KEY: content-disposition
      RULE: attachment

script 脚本匹配此时RULE内容为可执行的JS脚本内容脚本的返回值类型为bool，例如：

- NAME: http_struts2_url
  RULES:
    - ORIGIN: response.path
      RULE_TYPE: SCRIPT
      RULE: ORIGIN.endsWith(".do")

xpath 路径匹配此时RULE内容为html页面中需要存在的element结构，例如：

- NAME: http_file_upload_pre_test
  RULES:
    - ORIGIN: response.html
      RULE_TYPE: xpath
      RULE: 'input[type=file]'

当一条规则下的所有RULES都为true，才会判断存在该标签

指定页面截图

    ws = crawlerx.chromeWS("http://192.168.0.115:7317")
    code,err = crawlerx.PageScreenShot("http://testphp.vulnweb.com/",ws)
    println(code)

Documentation ¶

Index ¶

Variables
func PageScreenShot(url string, opts ...core.ConfigOpt) (string, error)
func StartCrawler(url string, opts ...core.ConfigOpt) (chan core.ReqInfo, error)
func StartCrawlerV2(url string, opts ...core.ConfigOpt) error
type Crawler
- func CreateCrawler(urlStr string) *Crawler

Constants ¶

This section is empty.

Variables ¶

View Source

var CrawlerXExports = map[string]interface{}{

	"StartCrawler":   StartCrawler,
	"StartCrawlerV2": StartCrawlerV2,
	"PageScreenShot": PageScreenShot,

	"proxy":        core.WithProxy,
	"maxUrl":       core.WithMaxUrl,
	"whiteList":    core.WithWhiteList,
	"blackList":    core.WithBlackList,
	"timeout":      core.WithTimeout,
	"maxDepth":     core.WithMaxDepth,
	"formFill":     core.WithFormFill,
	"header":       core.WithHeader,
	"headers":      core.WithHeaders,
	"concurrent":   core.WithConcurrent,
	"cookie":       core.WithCookie,
	"cookies":      core.WithCookies,
	"scanRange":    core.WithScanRange,
	"scanRepeat":   core.WithScanRepeat,
	"checkDanger":  core.WithCheckDanger,
	"tags":         core.WithTags,
	"fullTimeout":  core.WithFullCrawlerTimeout,
	"chromeWS":     core.WithChromeWS,
	"remote":       core.WithGetUrlRemote,
	"extraHeaders": core.WithExtraHeaders,

	"HighRepeatLevel":   detect.HighLevel,
	"MediumRepeatLevel": detect.MediumLevel,
	"LowRepeatLevel":    detect.LowLevel,
	"UnLimitRepeat":     detect.UnLimit,

	"AllDomainScan": detect.AllDomain,
	"SubMenuScan":   detect.SubMenu,
	"TargetUrlScan": detect.TargetUrl,
}

Functions ¶

func PageScreenShot ¶

func PageScreenShot(url string, opts ...core.ConfigOpt) (string, error)

func StartCrawler ¶

func StartCrawler(url string, opts ...core.ConfigOpt) (chan core.ReqInfo, error)

func StartCrawlerV2 ¶

func StartCrawlerV2(url string, opts ...core.ConfigOpt) error

Types ¶

type Crawler ¶

type Crawler struct {
	// contains filtered or unexported fields
}

func CreateCrawler ¶

func CreateCrawler(urlStr string) *Crawler

func (*Crawler) GetChannel ¶

func (crawler *Crawler) GetChannel() chan core.ReqInfo

func (*Crawler) Monitor ¶

func (crawler *Crawler) Monitor() error

func (*Crawler) PageScreenShot ¶

func (crawler *Crawler) PageScreenShot() (string, error)

func (*Crawler) SetBlackList ¶

func (crawler *Crawler) SetBlackList(blackRegStr string)

func (*Crawler) SetChromeWS ¶

func (crawler *Crawler) SetChromeWS(wsAddress string)

func (*Crawler) SetConcurrent ¶

func (crawler *Crawler) SetConcurrent(concurrent int)

func (*Crawler) SetCookie ¶

func (crawler *Crawler) SetCookie(domain, k, v string)

func (*Crawler) SetCookies ¶

func (crawler *Crawler) SetCookies(domain string, value map[string]string)

func (*Crawler) SetDangerUrlCheck ¶

func (crawler *Crawler) SetDangerUrlCheck()

func (*Crawler) SetExtraHeaders ¶

func (crawler *Crawler) SetExtraHeaders(headers ...string)

func (*Crawler) SetFormFill ¶

func (crawler *Crawler) SetFormFill(key, value string)

func (*Crawler) SetFullTimeout ¶

func (crawler *Crawler) SetFullTimeout(timeout int)

func (*Crawler) SetHeader ¶

func (crawler *Crawler) SetHeader(key, value string)

func (*Crawler) SetHeaders ¶

func (crawler *Crawler) SetHeaders(kv map[string]string)

func (*Crawler) SetMaxDepth ¶

func (crawler *Crawler) SetMaxDepth(depth int)

func (*Crawler) SetMaxUrl ¶

func (crawler *Crawler) SetMaxUrl(maxUrl int)

func (*Crawler) SetOnRequest ¶

func (crawler *Crawler) SetOnRequest(f func(core.ReqInfo))

func (*Crawler) SetProxy ¶

func (crawler *Crawler) SetProxy(proxyAddr string, proxyInfo ...string)

func (*Crawler) SetScanRange ¶

func (crawler *Crawler) SetScanRange(scanRange int)

func (*Crawler) SetScanRepeatLevel ¶

func (crawler *Crawler) SetScanRepeatLevel(scanRepeat int)

func (*Crawler) SetTags ¶

func (crawler *Crawler) SetTags(tagsPath string)

func (*Crawler) SetTimeout ¶

func (crawler *Crawler) SetTimeout(timeout int)

func (*Crawler) SetUrlFromProxy ¶

func (crawler *Crawler) SetUrlFromProxy(ifYes bool)

func (*Crawler) SetWhiteList ¶

func (crawler *Crawler) SetWhiteList(whiteRegStr string)

func (*Crawler) Start ¶

func (crawler *Crawler) Start() error

func (*Crawler) StartV2 ¶

func (crawler *Crawler) StartV2() error

func (*Crawler) StartVRemote ¶

func (crawler *Crawler) StartVRemote() error

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
cmd
sub
config Package config https://github.com/unknwon/goconfig	Package config https://github.com/unknwon/goconfig
core
detect
filter
newcrawlerx Package newcrawlerx @Author bcy2007 2023/5/23 11:02	Package newcrawlerx @Author bcy2007 2023/5/23 11:02
cmd Package cmd @Author bcy2007 2023/3/23 10:50	Package cmd @Author bcy2007 2023/3/23 10:50
tag

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL