gowap

module

v0.0.0-...-79a2231 Latest Latest Go to latest Published: Apr 24, 2023 License: Apache-2.0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/ddml/gowap

Links

Open Source Insights

README ¶

Gowap [Wappalyzer implementation in Go]

JS analysing (using Rod)
DNS scraping
Confidence rate
Recursive crawling
Rod browser integration (Colly can still be used - faster but not loading JS)
Can be used with as a cmd (technologies.json file embeded)
Test coverage 100%
robots.txt compliance

Usage

Using the package

go get github.com/unstppbl/gowap

Call Init() function with a Config object created with the NewConfig() function. It will return Wappalyzer object on which you can call Analyze method with URL string as argument.

    //Create a Config object and customize it
	config := gowap.NewConfig()
    //Path to override default technologies.json file
	config.AppsJSONPath = "path/to/my/technologies.json"
    //Timeout in seconds for fetching the url
	config.TimeoutSeconds = 5
    //Timeout in seconds for loading the page
	config.LoadingTimeoutSeconds = 5
    //Don't analyze page when depth superior to this number. Default (0) means no recursivity (only first page will be analyzed)
	config.MaxDepth = 2
    //Max number of pages to visit. Exit when reached
	config.MaxVisitedLinks = 10
    //Delay in ms between requests
	config.MsDelayBetweenRequests = 200
    //Choose scraper between rod (default) and colly
	config.Scraper = "colly"
    //Override the user-agent string
	config.UserAgent = "GoWap"
    //Output as a JSON string
    config.JSON = true

    //Initialisation
	wapp, err := gowap.Init(config)
    //Scraping 
    url := "https://scrapethissite.com/"
	res, err := wapp.Analyze(url)

Using the cmd

You can build the cmd using the commande : go build -o gowap cmd/gowap/main.go

Then using the compiled binary :

You must specify a url to analyse
Usage : gowap [options] <url>
  -delay int
    	Delay in ms between requests (default 100)
  -depth int
    	Don't analyze page when depth superior to this number. Default (0) means no recursivity (only first page will be analyzed)
  -file string
    	Path to override default technologies.json file
  -h	Help
  -loadtimeout int
    	Timeout in seconds for loading the page (default 3)
  -maxlinks int
    	Max number of pages to visit. Exit when reached (default 5)
  -pretty
    	Pretty print json output
  -scraper string
    	Choose scraper between rod (default) and colly (default "rod")
  -timeout int
    	Timeout in seconds for fetching the url (default 3)
  -useragent string
    	Override the user-agent string

To Do

List of some ideas :

analyse robots (field certIssuer)
analyse certificates (field certIssuer)
anayse css (field css)
anayse xhr requests (field xhr)
scrape an url list from a file in args
ability to choose what is scraped (DNS, cookies, HTML, scripts, etc...)
more tests in "real life"
perf ? regex html seems long
should output be the same as original wappalizer ? + ordering

Directories ¶

Path	Synopsis
cmd
gowap
pkg
core
scraper

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL