nim-drchaos/README.md

8.9 KiB

drchaos

Fuzzing is a technique for automated bug detection that involves providing random inputs to a target program to induce crashes. This approach can increase test coverage, enabling the identification of edge cases and more efficient triggering of bugs.

Drchaos extends the Nim interface to LLVM/Clang libFuzzer, an in-process, coverage-guided, and evolutionary fuzzing engine, while also introducing support for structured fuzzing. To utilize this functionality, users must specify the input type as a parameter for the target function, and the fuzzer generates valid inputs. This process employs value profiling to direct the fuzzer beyond these comparisons more efficiently than relying on the probability of finding the exact sequence of bytes by chance.

Usage

Creating a fuzz target by defining a data type and a target function that performs operations and verifies if the invariants are maintained via assert conditions is usually an uncomplicated task for most scenarios. For more information on creating effective fuzz targets, please refer to What makes a good fuzz target Once the target function is defined, the defaultMutator can be called with that function as argument.

A basic fuzz target, such as verifying that the software under test remains stable without crashing by defining a fixed-size type, can suffice:

import drchaos

proc fuzzMe(s: string, a, b, c: int32) =
  # The function being tested.
  if a == 0xdeadc0de'i32 and b == 0x11111111'i32 and c == 0x22222222'i32:
    if s.len == 100: doAssert false

proc fuzzTarget(data: (string, int32, int32, int32)) =
  let (s, a, b, c) = data
  fuzzMe(s, a, b, c)

defaultMutator(fuzzTarget)

WARNING: Modifying the input variable within fuzz targets is not allowed. If you are using ref types, you can prevent modifications by utilizing the func keyword and {.experimental: "strictFuncs".} in your code.

It is also possible to create more complex fuzz targets, such as the one shown below:

import drchaos

type
  ContentNodeKind = enum
    P, Br, Text
  ContentNode = object
    case kind: ContentNodeKind
    of P: pChildren: seq[ContentNode]
    of Br: discard
    of Text: textStr: string

proc `==`(a, b: ContentNode): bool =
  if a.kind != b.kind: return false
  case a.kind
  of P: return a.pChildren == b.pChildren
  of Br: return true
  of Text: return a.textStr == b.textStr

proc fuzzTarget(x: ContentNode) =
  # Convert or translate `x` to any desired format (JSON, HMTL, binary, etc.),
  # and then feed it into the API being tested.

defaultMutator(fuzzTarget)

Using drchaos, it is possible to generate millions of inputs and execute fuzzTarget within just a few seconds. More elaborate examples, such as fuzzing a graph library, can be located in the examples directory.

It is critical to define a == proc for the input type. Overloading proc default(_: typedesc[T]): T can also be advantageous, especially when nil is not a valid value for ref.

Needed config

To compile the fuzz target, it is recommended to use at least the following flags: --cc:clang -d:useMalloc -t:"-fsanitize=fuzzer,address,undefined" -l:"-fsanitize=fuzzer,address,undefined" -d:nosignalhandler --nomain:on -g. Additionally, it is recommended to use --mm:arc|orc when possible.

Sample nim.cfg and .nimble files can be found in the tests/ directory and this repository, respectively.

Alternatively, drchaos offers structured input for fuzzing using nim-testutils. This includes a convenient testrunner.

Post-processors

In some cases, it may be necessary to modify the randomized input to include specific values or create dependencies between certain fields. To support this functionality, drchaos offers a post-processing step that runs on compound types like object, tuple, ref, seq, string, array, and set. This step is only executed on these types for performance and clarity purposes, with distinct types being the exception.

proc postProcess(x: var ContentNode; r: var Rand) =
  if x.kind == Text:
    x.textStr = "The man the professor the student has studies Rome."

Custom mutator

The defaultMutator is a convenient way to generate and mutate inputs for a given fuzz target. However, if more fine-grained control is needed, the customMutator can be used. With customMutator, the mutation procedure can be customized to perform specific actions, such as uncompressing a seq[byte] before calling runMutator on the raw data, and then compressing the output again.

proc myTarget(x: seq[byte]) =
  var data = uncompress(x)
  # ...

proc myMutator(x: var seq[byte]; sizeIncreaseHint: Natural; r: var Rand) =
  var data = uncompress(x)
  runMutator(data, sizeIncreaseHint, r)
  x = compress(data)

customMutator(myTarget, myMutator)

User-defined mutate procs

Distinct types can be used to provide a mutate overload for fields with unique values or to restrict the search space. For example, it is possible to define a distinct type for file signatures or other specific values that may be of interest.

# Inside the library being fuzzed
when defined(runFuzzTests):
  type
    ClientId = distinct int

  proc `==`(a, b: ClientId): bool {.borrow.}
else:
  type
    ClientId = int

# Inside a test file
import drchaos/mutator

const
  idA = 0.ClientId
  idB = 2.ClientId
  idC = 4.ClientId

proc mutate(value: var ClientId; sizeIncreaseHint: int; enforceChanges: bool; r: var Rand) =
  # Call `random.rand()` to return a new value.
  repeatMutate(r.sample([idA, idB, idC]))

The drchaos/mutator module exports mutators for every supported type to aid in the creation of mutate functions.

User-defined serializers

User overloads should follow the following proc signatures:

proc fromData(data: openArray[byte]; pos: var int; output: var T)
proc toData(data: var openArray[byte]; pos: var int; input: T)
proc byteSize(x: T): int {.inline.} # The amount of memory that the serialized type will occupy, measured in bytes.

The need for this arises only in the case of objects that include raw pointers. To address this, drchaos/common offers read/write procedures to simplify the process.

It is necessary to define the mutate, default and == procedures. For container types, it is also necessary to define mitems or mpairs iterators.

Best practices and considerations

  • Avoid using echo in a fuzz target as it can significantly slow down the execution speed.

  • Prefer using -d:danger for maximum performance, but ensure that your code is free from undefined behavior and does not rely on any assumptions that may break in unexpected ways.

  • Once you have identified a crash, you can recompile the program with -d:debug and pass the crashing test case as a parameter to further investigate the cause of the crash.

  • Use debugEcho(x) in a target to print the input that caused the crash, which can be helpful in debugging and reproducing the issue.

  • Although disabling sanitizers may improve performance, it is not recommended as AddressSanitizer can help catch memory errors and undefined behavior that may lead to crashes or other bugs.

What's not supported

  • Polymorphic types do not have serialization support.
  • References with cycles are not supported. However, a .noFuzz custom pragma will be added soon for cursors.
  • Object variants only work with the latest memory management model, which is --mm:arc|orc.

Advantages of using drchaos for fuzzing

drchaos offers a number of advantages over frameworks based on FuzzDataProvider, which often have difficulty handling nested dynamic types. For a more detailed explanation of these issues, you can read an article by the author of Fuzzcheck, available at the following link: https://github.com/loiclec/fuzzcheck-rs/blob/main/articles/why_not_bytes.md

Bugs discovered with the assistance of drchaos

The drchaos framework has helped discover various bugs in software projects. Here are some examples of bugs that were found in the Nim reference implementation with the help of drchaos:

License

Licensed and distributed under either of

or

at your option. These files may not be copied, modified, or distributed except according to those terms.