Fixes: https://github.com/hashicorp/consul/issues/3676
This fixes a bug were registering an agent with a non-existent ACL token can prevent other
services registered with a good token from being synced to the server when using
`acl_enforce_version_8 = false`.
## Background
When `acl_enforce_version_8` is off the agent does not check the ACL token validity before
storing the service in its state.
When syncing a service registered with a missing ACL token we fall into the default error
handling case (https://github.com/hashicorp/consul/blob/master/agent/local/state.go#L1255)
and stop the sync (https://github.com/hashicorp/consul/blob/master/agent/local/state.go#L1082)
without setting its Synced property to true like in the permission denied case.
This means that the sync will always stop at the faulty service(s).
The order in which the services are synced is random since we iterate on a map. So eventually
all services with good ACL tokens will be synced, this can however take some time and is influenced
by the cluster size, the bigger the slower because retries are less frequent.
Having a service in this state also prevent all further sync of checks as they are done after
the services.
## Changes
This change modify the sync process to continue even if there is an error.
This fixes the issue described above as well as making the sync more error tolerant: if the server repeatedly refuses
a service (the ACL token could have been deleted by the time the service is synced, the servers
were upgraded to a newer version that has more strict checks on the service definition...).
Then all services and check that can be synced will, and those that don't will be marked as errors in
the logs instead of blocking the whole process.
* Adding the new backup guide
* Update website/source/docs/guides/backup.html.md
Looks good.
Co-Authored-By: kaitlincarter-hc <43049322+kaitlincarter-hc@users.noreply.github.com>
* Update website/source/docs/guides/backup.html.md
Co-Authored-By: kaitlincarter-hc <43049322+kaitlincarter-hc@users.noreply.github.com>
* Update website/source/docs/guides/backup.html.md
Co-Authored-By: kaitlincarter-hc <43049322+kaitlincarter-hc@users.noreply.github.com>
* Update website/source/docs/guides/backup.html.md
Co-Authored-By: kaitlincarter-hc <43049322+kaitlincarter-hc@users.noreply.github.com>
* Updated the directions for the restore command.
* Update website/source/docs/guides/backup.html.md
Co-Authored-By: kaitlincarter-hc <43049322+kaitlincarter-hc@users.noreply.github.com>
* Update website/source/docs/guides/backup.html.md
Co-Authored-By: kaitlincarter-hc <43049322+kaitlincarter-hc@users.noreply.github.com>
* updated the token env
* Trying to make it extra clear where to run the commands.
* added not that list of backed up items isn't inclusive
There has been some confusion about the formating of multi-line
string variables in the Helm chart. This adds examples for these
situations, hopefully clarifying things for users.
* added a new section for adding servers, updated section titles, and added code snippets.
* Fixing typos
* fixing typos
* Addressing some of Paul's feedback.
* Updated the outage recovery recommendation
* Adding examples and a summary. Minor structure updates.
* Added a link to the deployment guide, but needed to remove a sentence referring to a guide that's not published yet.
* fixed typo
This PR both prevents a blank CA config from being written out to
a snapshot and allows Consul to gracefully recover from a snapshot
with an invalid CA config.
Fixes#4954.