rewrite the readme

This commit is contained in:
Antonis Geralis 2023-04-10 17:18:53 +03:00
parent cda7648d2e
commit 8f5510b035
1 changed files with 95 additions and 66 deletions

161
README.md
View File

@ -1,30 +1,35 @@
# drchaos
Fuzzing is an automated bug finding technique, where randomized inputs are fed to a target
program in order to get it to crash. With fuzzing, you can increase your test coverage to
find edge cases and trigger bugs more effectively.
Fuzzing is a technique for automated bug detection that involves providing random inputs
to a target program to induce crashes. This approach can increase test coverage, enabling
the identification of edge cases and more efficient triggering of bugs.
drchaos extends the Nim interface to LLVM/Clang libFuzzer, an in-process,
coverage-guided, evolutionary fuzzing engine. And adds support for
Drchaos extends the Nim interface to LLVM/Clang libFuzzer, an in-process, coverage-guided,
and evolutionary fuzzing engine, while also introducing support for
[structured fuzzing](https://github.com/google/fuzzing/blob/master/docs/structure-aware-fuzzing.md).
The user should define the input type, as a parameter to the target function and the
fuzzer is responsible for providing valid inputs. Behind the scenes it uses value profiling
to guide the fuzzer past these comparisons much more efficiently than simply hoping to
stumble on the exact sequence of bytes by chance.
To utilize this functionality, users must specify the input type as a parameter for the
target function, and the fuzzer generates valid inputs. This process employs value
profiling to direct the fuzzer beyond these comparisons more efficiently than relying on
the probability of finding the exact sequence of bytes by chance.
## Usage
For most cases, it is fairly trivial to define a data type and a target function that
performs some operations and checks if the invariants expressed as assert conditions still
hold. See [What makes a good fuzz target](https://github.com/google/fuzzing/blob/master/docs/good-fuzz-target.md)
for more information. Then call `defaultMutator` with that function as parameter. That fuzz target can be as basic as
defining a fixed-size type and ensuring the software under test doesn't crash like:
Creating a fuzz target by defining a data type and a target function that performs
operations and verifies if the invariants are maintained via assert conditions is usually
an uncomplicated task for most scenarios. For more information on creating effective fuzz
targets, please refer to
[What makes a good fuzz target](https://github.com/google/fuzzing/blob/master/docs/good-fuzz-target.md)
Once the target function is defined, the `defaultMutator` can be called with that function
as argument.
A basic fuzz target, such as verifying that the software under test remains stable without
crashing by defining a fixed-size type, can suffice:
```nim
import drchaos
proc fuzzMe(s: string, a, b, c: int32) =
# function under test
# The function being tested.
if a == 0xdeadc0de'i32 and b == 0x11111111'i32 and c == 0x22222222'i32:
if s.len == 100: doAssert false
@ -35,10 +40,11 @@ proc fuzzTarget(data: (string, int32, int32, int32)) =
defaultMutator(fuzzTarget)
```
> **WARNING**: Fuzz targets must not modify the input variable. This can be ensured for `ref`
> pointers by using the `func` keyword and {.experimental: "strictFuncs".}
> **WARNING**: Modifying the input variable within fuzz targets is not allowed.
> If you are using ref types, you can prevent modifications by utilizing the `func` keyword
> and `{.experimental: "strictFuncs".}` in your code.
Or complex as shown bellow:
It is also possible to create more complex fuzz targets, such as the one shown below:
```nim
import drchaos
@ -60,33 +66,38 @@ proc `==`(a, b: ContentNode): bool =
of Text: return a.textStr == b.textStr
proc fuzzTarget(x: ContentNode) =
# Convert or translate `x` to any format (JSON, HMTL, binary, etc...)
# and feed it to the API you are testing.
# Convert or translate `x` to any desired format (JSON, HMTL, binary, etc.),
# and then feed it into the API being tested.
defaultMutator(fuzzTarget)
```
drchaos will generate millions of inputs and run `fuzzTarget` under a few seconds.
More articulate examples, such as fuzzing a graph library are in the `examples/` directory.
Using drchaos, it is possible to generate millions of inputs and execute fuzzTarget within
just a few seconds. More elaborate examples, such as fuzzing a graph library, can be
located in the [examples](examples/) directory.
Defining a `==` proc for the input type is necessary. `proc default(_: typedesc[T]): T` can also
be overloaded. Which is especially useful when `nil` for `ref` is not an acceptable value.
It is critical to define a `==` proc for the input type. Overloading
`proc default(_: typedesc[T]): T` can also be advantageous, especially when `nil` is not a
valid value for `ref`.
### Needed config
Compile with at least `--cc:clang -d:useMalloc -t:"-fsanitize=fuzzer,address,undefined" -l:"-fsanitize=fuzzer,address,undefined" -d:nosignalhandler --nomain:on -g`, `--mm:arc|orc` is recommended.
To compile the fuzz target, it is recommended to use at least the following flags:
`--cc:clang -d:useMalloc -t:"-fsanitize=fuzzer,address,undefined" -l:"-fsanitize=fuzzer,address,undefined" -d:nosignalhandler --nomain:on -g`.
Additionally, it is recommended to use `--mm:arc|orc` when possible.
Sample [nim.cfg](tests/nim.cfg) and [.nimble](https://github.com/planetis-m/fuzz-playground/blob/master/playground.nimble) files
Sample nim.cfg and .nimble files can be found in the [tests/](tests/nim.cfg) directory and
[this repository](https://github.com/planetis-m/fuzz-playground/blob/master/playground.nimble), respectively.
Alternatively, drchaos provides structured input for fuzzing with [nim-testutils](https://github.com/status-im/nim-testutils)
Which includes a convenient [testrunner](https://github.com/status-im/nim-testutils/blob/master/testutils/readme.md)
Alternatively, drchaos offers structured input for fuzzing using [nim-testutils](https://github.com/status-im/nim-testutils). This includes a convenient [testrunner](https://github.com/status-im/nim-testutils/blob/master/testutils/readme.md).
### Post-processors
Sometimes it is necessary to adjust the random input in order to add magic values or
dependencies between some fields. This is supported with a post-processing step, which for
performance and clarity reasons only runs on compound types such as
object/tuple/ref/seq/string/array/set and by exception distinct types.
In some cases, it may be necessary to modify the randomized input to include specific
values or create dependencies between certain fields. To support this functionality,
drchaos offers a post-processing step that runs on compound types like object, tuple, ref,
seq, string, array, and set. This step is only executed on these types for performance and
clarity purposes, with distinct types being the exception.
```nim
proc postProcess(x: var ContentNode; r: var Rand) =
@ -96,14 +107,16 @@ proc postProcess(x: var ContentNode; r: var Rand) =
### Custom mutator
Besides `defaultMutator` there is also `customMutator` which allows more fine-grained
control of the mutation procedure, like uncompressing a `seq[byte]` then calling
`runMutator` on the raw data and compressing the output again.
The `defaultMutator` is a convenient way to generate and mutate inputs for a given
fuzz target. However, if more fine-grained control is needed, the `customMutator`
can be used. With `customMutator`, the mutation procedure can be customized to
perform specific actions, such as uncompressing a `seq[byte]` before calling
`runMutator` on the raw data, and then compressing the output again.
```nim
proc myTarget(x: seq[byte]) =
var data = uncompress(x)
...
# ...
proc myMutator(x: var seq[byte]; sizeIncreaseHint: Natural; r: var Rand) =
var data = uncompress(x)
@ -115,11 +128,12 @@ customMutator(myTarget, myMutator)
### User-defined mutate procs
It's possible to use distinct types to provide a mutate overload for fields that have
interesting values, like file signatures or to limit the search space.
Distinct types can be used to provide a mutate overload for fields with unique values or
to restrict the search space. For example, it is possible to define a distinct type for
file signatures or other specific values that may be of interest.
```nim
# Fuzzed library
# Inside the library being fuzzed
when defined(runFuzzTests):
type
ClientId = distinct int
@ -129,7 +143,7 @@ else:
type
ClientId = int
# In a test file
# Inside a test file
import drchaos/mutator
const
@ -138,57 +152,72 @@ const
idC = 4.ClientId
proc mutate(value: var ClientId; sizeIncreaseHint: int; enforceChanges: bool; r: var Rand) =
# use `rand()` to return a new value.
# Call `random.rand()` to return a new value.
repeatMutate(r.sample([idA, idB, idC]))
```
For aiding the creation of mutate functions, mutators for every supported type are
exported by `drchaos/mutator`.
The `drchaos/mutator` module exports mutators for every supported type to aid in the
creation of mutate functions.
### User-defined serializers
User overloads must use the following proc signatures:
User overloads should follow the following `proc` signatures:
```nim
proc fromData(data: openArray[byte]; pos: var int; output: var T)
proc toData(data: var openArray[byte]; pos: var int; input: T)
proc byteSize(x: T): int {.inline.} ## The size that will be consumed by the serialized type in bytes.
proc byteSize(x: T): int {.inline.} # The amount of memory that the serialized type will occupy, measured in bytes.
```
This is only necessary for objects that contain raw pointers.
`drchaos/common` exports read/write procs that assist with this task.
The need for this arises only in the case of objects that include raw pointers. To address
this, `drchaos/common` offers read/write procedures to simplify the process.
`mutate`, `default` and `==` must also be defined.
Containers also need `mitems` or `mpairs` iterators.
It is necessary to define the `mutate`, `default` and `==` procedures. For container
types, it is also necessary to define `mitems` or `mpairs` iterators.
### Dos and don'ts
### Best practices and considerations
- Don't `echo` in a fuzz target as it slows down execution speed.
- Prefer `-d:danger` for maximum performance.
- Once you have a crash you can recompile with `-d:debug` and pass the crashing test case as parameter.
- Use `debugEcho(x)` in a target to print the crashing input.
- You could compile without sanitizers, AddressSanitizer slows down programs by ~2x, but it's not recommended.
- Avoid using `echo` in a fuzz target as it can significantly slow down the execution speed.
- Prefer using `-d:danger` for maximum performance, but ensure that your code is free from
undefined behavior and does not rely on any assumptions that may break in unexpected ways.
- Once you have identified a crash, you can recompile the program with `-d:debug` and pass the
crashing test case as a parameter to further investigate the cause of the crash.
- Use `debugEcho(x)` in a target to print the input that caused the crash, which can be
helpful in debugging and reproducing the issue.
- Although disabling sanitizers may improve performance, it is not recommended as
AddressSanitizer can help catch memory errors and undefined behavior that may lead to
crashes or other bugs.
### What's not supported
- Polymorphic types, missing serialization support.
- References with cycles. A `.noFuzz` custom pragma will be added soon for cursors.
- Object variants work only with the lastest memory management model `--mm:arc/orc`.
- Polymorphic types do not have serialization support.
- References with cycles are not supported. However, a .noFuzz custom pragma will be added soon for cursors.
- Object variants only work with the latest memory management model, which is `--mm:arc|orc`.
## Why choose drchaos
## Advantages of using drchaos for fuzzing
drchaos has several advantages over frameworks derived from
[FuzzDataProvider](https://github.com/google/fuzzing/blob/master/docs/split-inputs.md)
which struggle with dynamic types that in particular are nested. For a better explanation
read an article written by the author of
[Fuzzcheck](https://github.com/loiclec/fuzzcheck-rs/blob/main/articles/why_not_bytes.md).
drchaos offers a number of advantages over frameworks based on
[FuzzDataProvider](https://github.com/google/fuzzing/blob/master/docs/split-inputs.md),
which often have difficulty handling nested dynamic types. For a more detailed
explanation of these issues, you can read an article by the author of Fuzzcheck, available
at the following link: <https://github.com/loiclec/fuzzcheck-rs/blob/main/articles/why_not_bytes.md>
## Bugs found with the help of drchaos
### Nim reference implementation
* [use-after-free bugs in object variants](https://github.com/nim-lang/Nim/issues/20305)
* [openArray on empty seq triggers UB](https://github.com/nim-lang/Nim/issues/20294)
## Bugs discovered with the assistance of drchaos
The drchaos framework has helped discover various bugs in software projects. Here are some
examples of bugs that were found in the Nim reference implementation with the help of
drchaos:
* Use-after-free bugs in object variants (https://github.com/nim-lang/Nim/issues/20305)
* OpenArray on an empty sequence triggers undefined behavior (https://github.com/nim-lang/Nim/issues/20294)
## License