onion-grab
A tool that visits a list of domains over HTTPS to see if they have
Onion-Location configured.
Warning: research prototype. The source code may also be moved.
Quickstart
Install
You will need a Go compiler on the local system:
$ which go >/dev/null || echo "Go compiler is not in PATH"
Install onion-grab
:
$ go install git.cs.kau.se/rasmoste/onion-grab@latest
List all options:
$ onion-grab -h
Basic usage
Store one domain per line in a file:
$ cat domains.lst
www.eff.org
www.qubes-os.org
www.torproject.org
Run onion-grab with default parameters:
$ onion-grab -i domains.lst
2023/04/07 20:29:45 INFO: ctrl+C to exit prematurely
2023/04/07 20:29:45 INFO: starting 128 workers with limit 64/s
2023/04/07 20:29:45 INFO: starting work receiver
2023/04/07 20:29:45 INFO: starting work generator
www.qubes-os.org header= attribute=http://qubesosfasa4zl44o4tws22di6kepyzfeqv3tg4e3ztknltfxqrymdad.onion/
www.torproject.org header=http://2gzyxa5ihm7nsggfxnu52rck2vv4rvmdlkiu3zzui5du4xyclen53wid.onion/index.html attribute=
2023/04/07 20:29:50 INFO: metrics@receiver:
Processed: 3
Success: 3 (Onion-Location:2)
Failure: 0 (See breakdown below)
Req: 0 (Before sending request)
DNS: 0 (NotFound:0 Timeout:0 Other:0)
TCP: 0 (Timeout:0 Syscall:0)
TLS: 0 (Cert:0 Other:0)
3xx: 0 (Too many redirects)
EOF: 0 (Unclear meaning)
CTX: 0 (Deadline exceeded)
???: 0 (Other errors)
2023/04/07 20:29:51 INFO: about to exit in at most 11s, reading remaining answers
2023/04/07 20:29:57 INFO: metrics@receiver: summary:
Processed: 3
Success: 3 (Onion-Location:2)
Failure: 0 (See breakdown below)
Req: 0 (Before sending request)
DNS: 0 (NotFound:0 Timeout:0 Other:0)
TCP: 0 (Timeout:0 Syscall:0)
TLS: 0 (Cert:0 Other:0)
3xx: 0 (Too many redirects)
EOF: 0 (Unclear meaning)
CTX: 0 (Deadline exceeded)
???: 0 (Other errors)
2023/04/07 20:29:57 INFO: measurement duration was 12s
Sites with Onion-Location are printed to stdout, here showing that
www.torproject.org
configures it with an HTTP header while www.qubes-os.org
does it with an HTML attribute. All three sites connected successfully.
In case of errors, the type of error is identified with relatively few ???
.
Scripts
Digest the results, here stored as onion-grab.stdout
:
$ cat onion-grab.stdout
www.qubes-os.org header= attribute=http://qubesosfasa4zl44o4tws22di6kepyzfeqv3tg4e3ztknltfxqrymdad.onion/
www.torproject.org header=http://2gzyxa5ihm7nsggfxnu52rck2vv4rvmdlkiu3zzui5du4xyclen53wid.onion/index.html attribute=
$ ./scripts/digest.py -i onion-grab.stdout
digest.py:25 INFO: found 1 HTTP headers with Onion-Location
digest.py:26 INFO: found 1 HTML meta attributes with Onion-Location
digest.py:27 INFO: found 2 unqiue domain names that set Onion-Location
digest.py:28 INFO: found 2 unique two-label onion addresses in the process
digest.py:30 INFO: storing domains with valid Onion-Location configurations in domains.txt
digest.py:35 INFO: storing two-label onion addresses that domains referenced in onions.txt
$ cat domains.txt
www.qubes-os.org http://qubesosfasa4zl44o4tws22di6kepyzfeqv3tg4e3ztknltfxqrymdad.onion/
www.torproject.org http://2gzyxa5ihm7nsggfxnu52rck2vv4rvmdlkiu3zzui5du4xyclen53wid.onion/index.html
$ cat onions.txt
qubesosfasa4zl44o4tws22di6kepyzfeqv3tg4e3ztknltfxqrymdad.onion www.qubes-os.org
2gzyxa5ihm7nsggfxnu52rck2vv4rvmdlkiu3zzui5du4xyclen53wid.onion www.torproject.org
In other words, the digest script prints some information and writes two files:
domains.txt
: domains that configured valid Onion-Location headers. The
listed Onion-Location values are de-duplicated and space-separated.
onions.txt
: two-label .onion
addresses that were discovered. The listed
domains referenced this address in their Onion-Location configuration,
possibly with subdomains, paths, etc., that were removed. Such pruning of
the set Onion-Location values is useful to estimate the number of onions.
See scripts/test.sh and if you are looking to test
different onion-grab
configuration. You may find
scripts/measure.sh to be a useful measurement script.
Running a larger measurement
See docs/operations.md
for measurements of Tranco top-1M and ct-sans.
- rasmus (at) rgdd (dot) se
Licence
BSD 2-Clause License