chroprof - Profiling for [Chronos](https://github.com/status-im/nim-chronos)
============================================================================

This repo contains a usable profiler for [Chronos](https://github.com/status-im/nim-chronos). For the time being, it requires [a modified version of Chronos V4](https://github.com/codex-storage/nim-chronos/tree/feature/profiler-v4) which has profiling hooks enabled. Before attempting to use this, make sure you [understand the limitations](#limitations). Some of the rationale for the design and implementation of the profiler [can be found here](https://hackmd.io/eQ_ouNV4QZe0TG334_gkFg).

1. [Enabling profiling](#enabling-profiling)
2. [Looking at metrics](#looking-at-metrics)
2. [Enabling profiling with Prometheus metrics](#enabling-profiling-with-prometheus-metrics)
3. [Limitations](#limitations)


## Enabling Profiling

**Compile-time flag.** Profiling requires the `-d:chronosProfiling` compile-time
flag. If you do not pass it, importing `chroprof` will fail.

**Enabling the profiler.** The profiler must be enabled per event loop thread.
To enable it, you need to call, from the thread that will run your event loop:

```nim
import chroprof

enableProfiling()
``` 

## Looking at Metrics

At any time during execution, you can get a snapshot of the profiler metrics 
by calling `getMetrics()`. This will return a [`MetricsTotals`](https://github.com/codex-storage/nim-chroprof/blob/master/chroprof/profiler.nim#L61) object which
is a table mapping [`FutureType`](https://github.com/codex-storage/nim-chroprof/blob/master/chroprof/profiler.nim#L13)s to 
[`AggregateMetrics`](https://github.com/codex-storage/nim-chroprof/blob/master/chroprof/profiler.nim#L17). You may then 
print, log, or do whatever you like with those, including [export them to Prometheus](#enabling-profiling-with-prometheus-metrics).

`getMetrics()` will return the metrics for the event loop that is running 
(or ran) on the calling thread.

## Enabling profiling with Prometheus metrics

You can export metrics on the top-`k` async procs that are occupying the event
loop thread the most by enabling the profiler's [nim-metrics](https://github.com/status-im/nim-metrics/) collector:

```nim
import chroprof/collector

# Exports metrics for the 50 heaviest procs
enableProfilerMetrics(50)
```

with the help of [Grafana](https://grafana.com/), one can visualize and readily identify bottlenecks:

![Grafana screenshot](https://github.com/codex-storage/nim-chroprof/blob/gh-pages/assets/images/profiling-slowdown.png?raw=true)

the cumulative chart on the left shows that two procs (with the bottom one 
turning out to be a child of the top one) were dominating execution time at a 
certain point, whereas the one on the right shows a number of peaks and anomalies
which, in the context of a bug, may help identify the cause.

## Limitations

* Nested `waitFor` calls are not supported (as in, it'll crash your app);
* Prometheus metrics only work with `refc` because nim-metrics only works with `refc`;
* the Prometheus metrics collector can only be enabled for one event loop; i.e.,
  you cannot have multiple loops in different threads publishing metrics to Prometheus.