Roving
Distributed fuzzing using AFL.
Overview
AFL
AFL is a "fuzzer". You give it a target program, and it runs that target
program zillions of times, trying to find input that causes it to crash.
It uses instrumentation of the target program's code to try to manipulate its
input so that it explores as much of the target program as possible.
Roving
A roving cluster runs multiple copies of AFL on multiple machines, all
fuzzing the same target. Roving's key contribution is to allow these
machines to share and benefit from each other's work. If machine A finds
an "interesting" test case that causes a new function to get invoked,
machines B, C and D can all use this discovery to explore the rest of
the program more efficiently.
Cluster structure
A roving cluster consists of 1 server and N clients. Each client runs M
copies of AFL (using AFL's existing parallelism settings), and uses the
server to share their work with their peers. Each fuzzer on the client
periodically (by default, every 5 mins) uploads to the server their
current AFL state, including their queue. The server saves these states
in memory.
Fuzzers take advantage of the work of their peers by downloading
from the server the state of all clients in the cluster. They replace
their current queue with the combined queues of all clients, and then
continue fuzzing as before. This allows all clients to benefit from the
new, interesting testcases that any individual client discovers.
This approach relies on the non-determinism of AFL. If every client
deterministically ran the same test cases when given the same queue,
we would simply be repeating the same work N times across N different
clients. In reality, clients take the same queue and run in wildly
different directions with it. This means that we cover more of the search
space, faster.
That said, there is no formal partitioning of work, and there will be
some amount of duplication of work between clients. We do not currently
have any estimates of how much work is duplicated, but it is safe to say
that running 10 roving clients will not get you 10x the edge-discovery
rate of 1 client. Roving uses the same principle as AFL's own
single-machine parallelism, so we still have good reason to believe
that it is effective.
Usage
Bazel
For now roving uses [Bazel][https://docs.bazel.build/versions/master/install.html]
for its build. You'll need to download it in order to build roving.
Roving Server
- Export
AFL
with the path to afl, or make sure afl-fuzz
is on PATH
- In the workdir, create a
target
binary [optional]
- In the workdir, make a directory called
input
and populate it with a corpus
- Run
bazel build //cmd/srv
- Run
bazel-bin/cmd/srv/darwin_amd64_stripped/srv
Once up, it will create a directory called output
that mirrors the
structure of the output
directory created by AFL. It will aggregate
crashes, hangs, and the queue.
There is also a basic (but improving!) admin page at SERVER_URL:SERVER_PORT/admin
.
Roving Clients
Clients should require almost no configuration.
- Run
bazel build //cmd/client
- Run
bazel-bin/cmd/client/darwin_amd64_stripped/client -- -server-hostport XYZ:123 -parallelism X
Clients will accumulate crashes and hangs in their working dir. They will
sync them to the server.
Advanced usage
Run the compiled binaries with the -help
flag or see the files in the cmd/
folder for advanced options.
Development
Tests
The test suite is not particularly extensive, but you can run it
using:
bin/test
Design principles
Roving clients should be very dumb
Roving clients should be very dumb and have very
little configuration. This is so that clients can easily
be brought up, pointed at any roving server of any type, and
quickly start working.
If a roving server requires clients to be configured
in a particular way (perhaps the server wants them
to sync their work with it more frequently than normal),
this should be passed as configuration to the server,
which should then send it to the client when it starts up
and joins the cluster.
Fuzzer-agnosticism is good but currently not essential
We would like roving to be fuzzer-agnostic in the future. It should be
possible to power your fuzzing using afl
, libfuzzer
, hongfuzz
, or
any other reasonable fuzzer.
All of these fuzzers work in somewhat different ways and have somewhat
different structures and opinions. We are comfortable loosely coupling
ourselves to afl
for now - for example, we assume that fuzzer input and
output is structured in the way that afl
expects. However, we would like
multi-fuzzer support to be an achievable goal in the future, and would like
to avoid making decisions that would make this unreasonably difficult.
Running the examples
The example code bash scripts live in the examples/
directory.
C
examples/c-server
to build the target and run the example server serving the C example target on the default port 1414
examples/generic-client
to run the example client
Your client should find a crash within 30 seconds.
Ruby
- Install
afl-ruby
examples/ruby-server
to run the example server serving the Ruby example target on the default port 1414
examples/generic-client
to run the example client
Your client should again find a crash within 30 seconds.
Why Roving?
I asked some of my coworkers what they'd name a distributed fuzzy thing.
Evidently roving is extremely fuzzy, and winds up everywhere when
you're working with it. Plus the testcases go roving and it's all very poetic.
Credit
- Stripe has substantially contributed to Roving, by directly supporting its development in paid time, as well as contributing that development back to the open source project.
- Rob Heaton spent huge amounts of time adding features, finding bugs, and documenting the project.