16 KiB
nim-metrics
Introduction
Nim metrics client library supporting the Prometheus monitoring toolkit, StatsD and Carbon. Designed to be thread-safe and efficient, it's disabled by default so libraries can use it without any overhead for those library users not interested in metrics.
Installation
You can install the development version of the library through Nimble with the following command:
nimble install https://github.com/status-im/nim-metrics@#master
Usage
To enable metrics, compile your code with -d:metrics --threads:on
.
To avoid depending on PCRE, compile with -d:withoutPCRE
.
Architectural overview
Collector
objects holding various Metric
objects are registered in one or
more Registry
objects. There is a default registry being used for the most
common case.
Metric values are float64
, but the API also accepts int64
parameters which
are then cast to float64
.
By starting an HTTP server, custom metrics (and some default ones) can be pulled by Prometheus. By specifying backends, those same custom metrics will be pushed to StatsD or Carbon servers, as soon as they are modified. They can also be serialised to strings for some quick and dirty logging. Integration with the Chronicles logging library is available in a separate module.
That HTTP server used for pulling is running in its own thread. Metric pushing also uses a dedicated thread for networking, in order to minimise the overhead.
Collector types
Counter
A counter's value can only be incremented.
# Declare a variable `myCounter` holding a `Counter` object with a `Metric`
# having the same name as the variable. The help string is mandatory. The initial
# value is 0 and it's automatically added to `defaultRegistry`.
declareCounter myCounter, "an example counter"
# increment it by 1
myCounter.inc()
# increment it by 10
myCounter.inc(10)
# count all exceptions in a block
someCounter.countExceptions:
foo()
# or just an exception type
otherCounter.countExceptions(ValueError):
bar()
# do you need a variable that's being exported from the module?
declarePublicCounter seenPeers, "number of seen peers"
# it's the equivalent of `var seenPeers* = ...`
# want to avoid declaring a variable, giving it a help string, or anything else for that matter?
counter("one_off_counter").inc()
# What this does is generate a {.global.} var, so as long as you use the same
# string, you're using the same counter. Using strings instead of identifiers
# skips any compiler protection in case of typos, so this API is not recommended
# for serious use.
Gauge
Gauges can be incremented, decremented or set to a given value.
declareGauge myGauge, "an example gauge" # or `declarePublicGauge` to export it
myGauge.inc(4.5)
myGauge.dec(2)
myGauge.set(10)
myGauge.setToCurrentTime() # Unix timestamp in seconds
myGauge.trackInProgress:
# myGauge is incremented at the start of the block (a `myGauge.inc()` is being inserted here)
foo()
# and decremented at the end (`myGauge.dec()`)
# set the gauge to the runtime of a block, in seconds
myGauge.time:
bar()
# alternative, unrecommended API
gauge("one_off_gauge").set(42)
Summary
Summaries sample observations and provide a total count and the sum of all observed values.
declareSummary mySummary, "an example summary" # or `declarePublicSummary` to export it
mySummary.observe(10)
mySummary.observe(0.5)
echo mySummary
This will print out:
# HELP mySummary an example summary
# TYPE mySummary summary
mySummary_sum 10.5 1569332171696
mySummary_count 2.0 1569332171696
mySummary_created 1569332171.0
# observe the execution duration of a block, in seconds
mySummary.time:
foo()
# alternative, unrecommended API
summary("one_off_summary").observe(10)
Histogram
These cumulative histograms store the count and total sum of observed values, just like summaries. Further more, they place the observed values in configurable buckets and provide per-bucket counts.
Note that an observed value will be counted in all buckets that have a size greater or equal to it.
declareHistogram myHistogram, "an example histogram" # or `declarePublicHistogram` to export it
# This uses the default bucket sizes: [0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1.0,
# 2.5, 5.0, 7.5, 10.0, Inf]
# You can customise the buckets:
declareHistogram withCustomBuckets, "custom buckets", buckets = [0.0, 1.0, 2.0]
# if you leave out the "Inf" bucket, it's added for you
withCustomBuckets.observe(0.5)
withCustomBuckets.observe(1)
withCustomBuckets.observe(1.5)
withCustomBuckets.observe(3.7)
echo withCustomBuckets
This will print out:
# HELP withCustomBuckets custom buckets
# TYPE withCustomBuckets histogram
withCustomBuckets_sum 6.7 1569334493506
withCustomBuckets_count 4.0 1569334493506
withCustomBuckets_created 1569334493.0
withCustomBuckets_bucket{le="0.0"} 0.0
withCustomBuckets_bucket{le="1.0"} 2.0 1569334493506
withCustomBuckets_bucket{le="2.0"} 3.0 1569334493506
withCustomBuckets_bucket{le="+Inf"} 4.0 1569334493506
# observe the execution duration of a block, in seconds
myHistogram.time:
foo()
# alternative, unrecommended API
histogram("one_off_histogram").observe(10)
Custom collectors
Sometimes you need to create metrics on the fly, with a custom collect()
method of a custom collector type.
Let's say you have an USB-attached power meter and, for some reason, you want to read the power consumption every time Prometheus reads your metrics:
import metrics, times
when defined(metrics):
type PowerCollector = ref object of Gauge
var powerCollector = PowerCollector.newCollector(name = "power_usage", help = "Instantaneous power usage - in watts.")
method collect(collector: PowerCollector): Metrics =
let timestamp = getTime().toMilliseconds()
result[@[]] = @[
Metric(
name: "power_usage",
value: getPowerUsage(), # your power-meter reader
timestamp: timestamp,
)
]
There's a bit of repetition in the collector and metric names, because we no longer have behind-the-scenes name copying/deriving there.
You can output multiple metrics from your custom collect()
method. It's
perfectly legal and we do that internally for our system/runtime metrics.
Try not to get creative with dynamic metric names - Prometheus has a hard time dealing with that.
Labels
Metric labels are supported for the Prometheus backend, as a way to add extra dimensions corresponding to each combination of metric name and label values. This can quickly get out of hand, as you can guess, so don't go overboard with this feature. (See also the relevant warnings in Prometheus' docs.)
You declare label names when defining the collector and label values each time you update it:
declareCounter lCounter, "example counter with labels", ["foo", "bar"]
lCounter.inc(labelValues = ["1", "a"]) # the label values must be strings
lCounter.inc(labelValues = ["2", "b"])
# How many metrics are now in this collector? Two, because we used two sets of label values:
echo lCounter
# HELP lCounter example counter with labels
# TYPE lCounter counter
lCounter_total{foo="1",bar="a"} 1.0 1569340503703
lCounter_created{foo="1",bar="a"} 1569340503.0
lCounter_total{foo="2",bar="b"} 1.0 1569340503703
lCounter_created{foo="2",bar="b"} 1569340503.0
(OK, there are four metrics in total, because each one gets a *_created
buddy.)
So if you must use labels, make sure there's a finite and small number of possible label values being set.
Metric name and label name validation
We use Prometheus standards for that, so metric names must comply with the
^[a-zA-Z_:][a-zA-Z0-9_:]*$
regex while label names have to comply with
^[a-zA-Z_][a-zA-Z0-9_]*$
.
In the examples you've seen so far, all collectors declared with
declare<CollectorType>
had more stringent naming rules, since their names were
also identifiers for Nim variables - which can't have colons in them.
To overcome this, without relying on the discouraged alternative API, use the name
parameter:
declareCounter cCounter, "counter with colons in name", name = "foo:bar:baz"
cCounter.inc()
echo cCounter
# HELP foo:bar:baz counter with colons in name
# TYPE foo:bar:baz counter
foo:bar:baz_total 1.0 1569341756504
foo:bar:baz_created 1569341756.0
Logging
Metrics are not logs, but you might want to log them nonetheless. The $
procedure is defined for collectors and registries, so you can just use the
built-in string serialisation to print them:
echo myCounter, myGauge
echo defaultRegistry
Integration with Chronicles is available in a separate module:
import chronicles, metrics, metrics/chronicles_support
# ...
info "myCounter", myCounter
debug "default registry", defaultRegistry
Testing
When testing, you might want to isolate some collectors by registering them into a custom registry:
var myRegistry = newRegistry()
declareCounter myCounter, "help", registry = myRegistry
echo myRegistry
# this means that `myCounter` is no longer registered in `defaultRegistry`
echo defaultRegistry
These unoptimised (read "very inefficient") value()
and valueByName()
procedures for accessing metric values should only be used inside test suites:
suite "counter":
test "basic":
declareCounter myCounter, "help"
check myCounter.value == 0
myCounter.inc()
check myCounter.value == 1
declareSummary cSummary, "summary with colons in name", name = "foo:bar:baz"
cSummary.observe(10)
check cSummary.valueByName("foo:bar:baz_count") == 1
check cSummary.valueByName("foo:bar:baz_sum") == 10
Prometheus endpoint
First, you need to choose the HTTP server implementation.
Standard library
Using asynchttpserver which is based on asyncdispatch from the Nim standard library:
import metrics, metrics/stdlib_httpserver
Chronos
Using Chronos - an asyncdispatch alternative:
import metrics, metrics/chronos_httpserver
Starting the HTTP server
Start an HTTP server listening on 127.0.0.1:8000
from which the Prometheus
daemon can pull the metrics from all collectors in defaultRegistry
(plus the
default metrics):
startMetricsHttpServer()
Or set your own address and port to listen to:
import net
startMetricsHttpServer("127.0.0.1", Port(8000))
The HTTP server will run in its own thread. It will expose two endpoints:
- http://127.0.0.1:8000/metrics - Returns the metrics consumed by Prometheus.
- http://127.0.0.1:8000/health - Healthcheck that returns
OK
string and 200 code.
System metrics
Default metrics available (see also the relevant Prometheus docs):
process_cpu_seconds_total
process_open_fds
process_max_fds
process_virtual_memory_bytes
process_resident_memory_bytes
process_start_time_seconds
nim_gc_mem_bytes[thread_id]
nim_gc_mem_occupied_bytes[thread_id]
nim_gc_heap_instance_occupied_bytes[type_name]
nim_gc_heap_instance_occupied_summed_bytes
The process_*
metrics are only available on Linux, for now.
nim_gc_heap_instance_occupied_bytes
is only available when compiling with
-d:nimTypeNames
and holds the top 10 instance types, in reverse order of
their total heap usage (from all threads), at the time the metric is created.
Since this set changes with time, you'll see more than 10 types in Grafana.
The thread-specific metrics are being updated automatically when a user-defined metric is changed in the main thread, but only if a minimal interval has passed since the last update (defaults to 10 second). All other system metrics are custom collectors which are updated at collection time.
import times
when defined(metrics):
# get the default minimal update interval
echo getSystemMetricsUpdateInterval()
# you can change it
setSystemMetricsUpdateInterval(initDuration(seconds = 2))
You can also disable this automated piggy-backing on user-defined metric value changes, if you need more regularity, and take charge of updating system metrics yourself.
# disable automatic updates
setSystemMetricsAutomaticUpdate(false)
# somewhere in your event loop, at an interval of your choice
updateThreadMetrics()
Those metrics with with a "thread_id" label are thread-specific metrics. The
automatic update only covers thread metrics for the main thread. You'll have to
call updateThreadMetrics()
by yourself for any other thread you care about.
Screenshot of Grafana showing data from Prometheus that pulls it from Nimbus which uses nim-metrics:
StatsD
Add a StatsD export backend where metric updates will be pushed as soon as they are created:
import metrics, net
when defined(metrics):
addExportBackend(
metricProtocol = STATSD,
netProtocol = UDP,
address = "127.0.0.1",
port = Port(8125)
)
declareCounter myCounter, "some counter"
myCounter.inc()
# When we incremented the counter, the corresponding data was sent over the wire to the StatsD daemon.
The only supported collector types are counters and gauges. There's a dedicated thread that does the networking part. When you update these collectors, data is sent over a channel to that thread. If the channel's buffer is full, the data is silently dropped. Same for an unreachable backend or any other networking error. Reconnections are tried automatically and there's one socket per backend being reused.
All the complexity is hidden from the API user and additional latency is kept to a minimum. Exported metrics are treated like disposable data and dropped at the first sign of trouble.
Counters support an additional parameter just for StatsD: sampleRate
. This
allows sending just a percentage of the increments to the StatsD daemon.
Nothing else changes on the client side.
declareCounter sCounter, "counter with a sample rate set", sampleRate = 0.1
sCounter.inc()
# Now only 10% (on average) of this counter's updates will be sent over the
# wire. We throw a dice when the time comes, using a simple PRNG. We also
# inform the StatsD daemon about this rate, so it can adjust its estimated value
# accordingly.
Carbon
Add a Carbon export backend where metric updates will be pushed as soon as they are created:
import metrics, net
when defined(metrics):
addExportBackend(
metricProtocol = CARBON,
netProtocol = TCP,
address = "127.0.0.1",
port = Port(2003)
)
The implementation is very similar to the StatsD metric exporting described above.
You may add as many export backends as you want, but deleting them from the
exportBackends
global variable is unsupported.
Contributing
When submitting pull requests, please add test cases for any new features or
fixes and make sure nimble test
is still able to execute the entire test
suite successfully.
License
Licensed and distributed under either of
- MIT license: LICENSE-MIT or http://opensource.org/licenses/MIT
- Apache License, Version 2.0, (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
at your option. These files may not be copied, modified, or distributed except according to those terms.