nim-drchaos/README.md

229 lines
8.9 KiB
Markdown

# drchaos
Fuzzing is a technique for automated bug detection that involves providing random inputs
to a target program to induce crashes. This approach can increase test coverage, enabling
the identification of edge cases and more efficient triggering of bugs.
Drchaos extends the Nim interface to LLVM/Clang libFuzzer, an in-process, coverage-guided,
and evolutionary fuzzing engine, while also introducing support for
[structured fuzzing](https://github.com/google/fuzzing/blob/master/docs/structure-aware-fuzzing.md).
To utilize this functionality, users must specify the input type as a parameter for the
target function, and the fuzzer generates valid inputs. This process employs value
profiling to direct the fuzzer beyond these comparisons more efficiently than relying on
the probability of finding the exact sequence of bytes by chance.
## Usage
Creating a fuzz target by defining a data type and a target function that performs
operations and verifies if the invariants are maintained via assert conditions is usually
an uncomplicated task for most scenarios. For more information on creating effective fuzz
targets, please refer to
[What makes a good fuzz target](https://github.com/google/fuzzing/blob/master/docs/good-fuzz-target.md)
Once the target function is defined, the `defaultMutator` can be called with that function
as argument.
A basic fuzz target, such as verifying that the software under test remains stable without
crashing by defining a fixed-size type, can suffice:
```nim
import drchaos
proc fuzzMe(s: string, a, b, c: int32) =
# The function being tested.
if a == 0xdeadc0de'i32 and b == 0x11111111'i32 and c == 0x22222222'i32:
if s.len == 100: doAssert false
proc fuzzTarget(data: (string, int32, int32, int32)) =
let (s, a, b, c) = data
fuzzMe(s, a, b, c)
defaultMutator(fuzzTarget)
```
> **WARNING**: Modifying the input variable within fuzz targets is not allowed.
> If you are using ref types, you can prevent modifications by utilizing the `func` keyword
> and `{.experimental: "strictFuncs".}` in your code.
It is also possible to create more complex fuzz targets, such as the one shown below:
```nim
import drchaos
type
ContentNodeKind = enum
P, Br, Text
ContentNode = object
case kind: ContentNodeKind
of P: pChildren: seq[ContentNode]
of Br: discard
of Text: textStr: string
proc `==`(a, b: ContentNode): bool =
if a.kind != b.kind: return false
case a.kind
of P: return a.pChildren == b.pChildren
of Br: return true
of Text: return a.textStr == b.textStr
proc fuzzTarget(x: ContentNode) =
# Convert or translate `x` to any desired format (JSON, HMTL, binary, etc.),
# and then feed it into the API being tested.
defaultMutator(fuzzTarget)
```
Using drchaos, it is possible to generate millions of inputs and execute fuzzTarget within
just a few seconds. More elaborate examples, such as fuzzing a graph library, can be
located in the [examples](examples/) directory.
It is critical to define a `==` proc for the input type. Overloading
`proc default(_: typedesc[T]): T` can also be advantageous, especially when `nil` is not a
valid value for `ref`.
### Needed config
To compile the fuzz target, it is recommended to use at least the following flags:
`--cc:clang -d:useMalloc -t:"-fsanitize=fuzzer,address,undefined" -l:"-fsanitize=fuzzer,address,undefined" -d:nosignalhandler --nomain:on -g`.
Additionally, it is recommended to use `--mm:arc|orc` when possible.
Sample nim.cfg and .nimble files can be found in the [tests/](tests/nim.cfg) directory and
[this repository](https://github.com/planetis-m/fuzz-playground/blob/master/playground.nimble), respectively.
Alternatively, drchaos offers structured input for fuzzing using [nim-testutils](https://github.com/status-im/nim-testutils). This includes a convenient [testrunner](https://github.com/status-im/nim-testutils/blob/master/testutils/readme.md).
### Post-processors
In some cases, it may be necessary to modify the randomized input to include specific
values or create dependencies between certain fields. To support this functionality,
drchaos offers a post-processing step that runs on compound types like object, tuple, ref,
seq, string, array, and set. This step is only executed on these types for performance and
clarity purposes, with distinct types being the exception.
```nim
proc postProcess(x: var ContentNode; r: var Rand) =
if x.kind == Text:
x.textStr = "The man the professor the student has studies Rome."
```
### Custom mutator
The `defaultMutator` is a convenient way to generate and mutate inputs for a given
fuzz target. However, if more fine-grained control is needed, the `customMutator`
can be used. With `customMutator`, the mutation procedure can be customized to
perform specific actions, such as uncompressing a `seq[byte]` before calling
`runMutator` on the raw data, and then compressing the output again.
```nim
proc myTarget(x: seq[byte]) =
var data = uncompress(x)
# ...
proc myMutator(x: var seq[byte]; sizeIncreaseHint: Natural; r: var Rand) =
var data = uncompress(x)
runMutator(data, sizeIncreaseHint, r)
x = compress(data)
customMutator(myTarget, myMutator)
```
### User-defined mutate procs
Distinct types can be used to provide a mutate overload for fields with unique values or
to restrict the search space. For example, it is possible to define a distinct type for
file signatures or other specific values that may be of interest.
```nim
# Inside the library being fuzzed
when defined(runFuzzTests):
type
ClientId = distinct int
proc `==`(a, b: ClientId): bool {.borrow.}
else:
type
ClientId = int
# Inside a test file
import drchaos/mutator
const
idA = 0.ClientId
idB = 2.ClientId
idC = 4.ClientId
proc mutate(value: var ClientId; sizeIncreaseHint: int; enforceChanges: bool; r: var Rand) =
# Call `random.rand()` to return a new value.
repeatMutate(r.sample([idA, idB, idC]))
```
The `drchaos/mutator` module exports mutators for every supported type to aid in the
creation of mutate functions.
### User-defined serializers
User overloads should follow the following `proc` signatures:
```nim
proc fromData(data: openArray[byte]; pos: var int; output: var T)
proc toData(data: var openArray[byte]; pos: var int; input: T)
proc byteSize(x: T): int {.inline.} # The amount of memory that the serialized type will occupy, measured in bytes.
```
The need for this arises only in the case of objects that include raw pointers. To address
this, `drchaos/common` offers read/write procedures to simplify the process.
It is necessary to define the `mutate`, `default` and `==` procedures. For container
types, it is also necessary to define `mitems` or `mpairs` iterators.
### Best practices and considerations
- Avoid using `echo` in a fuzz target as it can significantly slow down the execution speed.
- Prefer using `-d:danger` for maximum performance, but ensure that your code is free from
undefined behavior and does not rely on any assumptions that may break in unexpected ways.
- Once you have identified a crash, you can recompile the program with `-d:debug` and pass the
crashing test case as a parameter to further investigate the cause of the crash.
- Use `debugEcho(x)` in a target to print the input that caused the crash, which can be
helpful in debugging and reproducing the issue.
- Although disabling sanitizers may improve performance, it is not recommended as
AddressSanitizer can help catch memory errors and undefined behavior that may lead to
crashes or other bugs.
### What's not supported
- Polymorphic types do not have serialization support.
- References with cycles are not supported. However, a .noFuzz custom pragma will be added soon for cursors.
- Object variants only work with the latest memory management model, which is `--mm:arc|orc`.
## Advantages of using drchaos for fuzzing
drchaos offers a number of advantages over frameworks based on
[FuzzDataProvider](https://github.com/google/fuzzing/blob/master/docs/split-inputs.md),
which often have difficulty handling nested dynamic types. For a more detailed
explanation of these issues, you can read an article by the author of Fuzzcheck, available
at the following link: <https://github.com/loiclec/fuzzcheck-rs/blob/main/articles/why_not_bytes.md>
## Bugs discovered with the assistance of drchaos
The drchaos framework has helped discover various bugs in software projects. Here are some
examples of bugs that were found in the Nim reference implementation with the help of
drchaos:
* Use-after-free bugs in object variants (https://github.com/nim-lang/Nim/issues/20305)
* OpenArray on an empty sequence triggers undefined behavior (https://github.com/nim-lang/Nim/issues/20294)
## License
Licensed and distributed under either of
* MIT license: [LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT
or
* Apache License, Version 2.0, ([LICENSE-APACHEv2](LICENSE-APACHEv2) or http://www.apache.org/licenses/LICENSE-2.0)
at your option. These files may not be copied, modified, or distributed except according to those terms.