Paul Banks 3ad754ca7b
Make Raft trailing logs and snapshot timing reloadable (#10129)
* WIP reloadable raft config

* Pre-define new raft gauges

* Update go-metrics to change gauge reset behaviour

* Update raft to pull in new metric and reloadable config

* Add snapshot persistance timing and installSnapshot to our 'protected' list as they can be infrequent but are important

* Update telemetry docs

* Update config and telemetry docs

* Add note to oldestLogAge on when it is visible

* Add changelog entry

* Update website/content/docs/agent/options.mdx

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
2021-05-04 15:36:53 +01:00

6.0 KiB

UNRELEASED

1.3.0 (April 22nd, 2021)

IMPROVEMENTS

  • Added metrics for oldestLogAge and lastRestoreDuration to monitor capacity issues that can cause unrecoverable cluster failure [GH-452][GH-454]
  • Made TrailingLogs, SnapshotInterval and SnapshotThreshold reloadable at runtime using a new ReloadConfig method. This allows recovery from cases where there are not enough logs retained for followers to catchup after a restart. [GH-444]
  • Inclusify the repository by switching to main [GH-446]
  • Add option for a buffered ApplyCh if MaxAppendEntries is enabled [GH-445]
  • Add string to LogType for more human readable debugging [GH-442]
  • Extract fuzzy testing into its own module [GH-459]

BUG FIXES

  • Update LogCache StoreLogs() to capture an error that would previously cause a panic [GH-460]

1.2.0 (October 5th, 2020)

IMPROVEMENTS

  • Remove StartAsLeader configuration option [GH-364]
  • Allow futures to react to Shutdown() to prevent a deadlock with takeSnapshot() [GH-390]
  • Prevent non-voters from becoming eligible for leadership elections [GH-398]
  • Remove an unneeded io.Copy from snapshot writes [GH-399]
  • Log decoded candidate address in duplicate requestVote warning [GH-400]
  • Prevent starting a TCP transport when IP address is nil [GH-403]
  • Reject leadership transfer requests when in candidate state to prevent indefinite blocking while unable to elect a leader [GH-413]
  • Add labels for metric metadata to reduce cardinality of metric names [GH-409]
  • Add peers metric [GH-413]

BUG FIXES

  • Make LeaderCh always deliver the latest leadership transition [GH-384]
  • Handle updating an existing peer in startStopReplication [GH-419]

1.1.2 (January 17th, 2020)

FEATURES

  • Improve FSM apply performance through batching. Implementing the BatchingFSM interface enables this new feature [GH-364]
  • Add ability to obtain Raft configuration before Raft starts with GetConfiguration [GH-369]

IMPROVEMENTS

  • Remove lint violations and add a make rule for running the linter.
  • Replace logger with hclog [GH-360]
  • Read latest configuration independently from main loop [GH-379]

BUG FIXES

  • Export the leader field in LeaderObservation [GH-357]
  • Fix snapshot to not attempt to truncate a negative range [GH-358]
  • Check for shutdown in inmemPipeline before sending RPCs [GH-276]

1.1.1 (July 23rd, 2019)

FEATURES

  • Add support for extensions to be sent on log entries [GH-353]
  • Add config option to skip snapshot restore on startup [GH-340]
  • Add optional configuration store interface [GH-339]

IMPROVEMENTS

  • Break out of group commit early when no logs are present [GH-341]

BUGFIXES

  • Fix 64-bit counters on 32-bit platforms [GH-344]
  • Don't defer closing source in recover/restore operations since it's in a loop [GH-337]

1.1.0 (May 23rd, 2019)

FEATURES

  • Add transfer leadership extension [GH-306]

IMPROVEMENTS

BUGFIXES

  • Copy the contents of an InmemSnapshotStore when opening a snapshot [GH-270]
  • Fix logging panic when converting parameters to strings [GH-332]

1.0.1 (April 12th, 2019)

IMPROVEMENTS

  • InMemTransport: Add timeout for sending a message [GH-313]
  • ensure 'make deps' downloads test dependencies like testify [GH-310]
  • Clarifies function of CommitTimeout [GH-309]
  • Add additional metrics regarding log dispatching and committal [GH-316]

1.0.0 (October 3rd, 2017)

v1.0.0 takes the changes that were staged in the library-v2-stage-one branch. This version manages server identities using a UUID, so introduces some breaking API changes. It also versions the Raft protocol, and requires some special steps when interoperating with Raft servers running older versions of the library (see the detailed comment in config.go about version compatibility). You can reference https://github.com/hashicorp/consul/pull/2222 for an idea of what was required to port Consul to these new interfaces.

0.1.0 (September 29th, 2017)

v0.1.0 is the original stable version of the library that was in main and has been maintained with no breaking API changes. This was in use by Consul prior to version 0.7.0.