Commit Graph

3 Commits

Author SHA1 Message Date
Raúl Kripalani 31d951f370 implement automatic heapdumps when usage is above threshold.
A heapdump will be captured when the usage trespasses the threshold.
Staying above the threshold won't trigger another heapdump.
If the usage goes down, then back up, that is considered another
"episode" to be captured in a heapdump.

This feature is driven by three parameters:

* HeapdumpDir: the directory where the watchdog will write the heapdump.
  It will be created if it doesn't exist upon initialization. An error when
  creating the dir will not prevent heapdog initialization; it will just
  disable the heapdump capture feature.

  If zero-valued, the feature is disabled. Heapdumps will be written to path:
  <HeapdumpDir>/<RFC3339Nano formatted timestamp>.heap.

* HeapdumpMaxCaptures: sets the maximum amount of heapdumps a process will
  generate. This limits the amount of episodes that will be captured, in case
  the utilization climbs repeatedly over the threshold. By default, it is 10.

* HeapdumpThreshold: sets the utilization threshold that will trigger a
  heap dump to be taken automatically. A zero value disables this feature.
  By default, it is disabled.
2021-01-19 20:02:16 +00:00
Raúl Kripalani 5f00469e3a remove 'immediate' flag in policies. 2020-12-09 15:35:29 +00:00
Raúl Kripalani 4558d98653 major rewrite of go-watchdog.
This commit introduces a major rewrite of go-watchdog.

* HeapDriven and SystemDriven are now distinct run modes.
* WIP ProcessDriven that uses cgroups.
* Policies are now stateless, pure and greatly simplified.
* Policies now return the next utilization at which GC
  should run. The watchdog enforces that value differently
  depending on the run mode.
* The heap-driven run mode adjusts GOGC dynamically. This
  places the responsibility on the Go runtime to honour the
  trigger point, and results in more robust logic that is not
  vulnerable to very quick bursts within sampling periods.
* The heap-driven run mode is no longer polling (interval-driven).
  Instead, it relies entirely on GC signals.
* The Silence and Emergency features of the watermark policy
  have been removed. If utilization is above the last watermark,
  the policy will request immediate GC.
* Races removed.
2020-12-08 14:19:04 +00:00