We noticed that TestUpstreamListener would deadlock sometimes when run
with the race detector. While debugging this issue I found and fixed the
following problems.
1. the net.Listener was not being closed properly when Listener.Stop was
called. This caused the Listener.Serve goroutine to run forever.
Fixed by storing a reference to net.Listener and closing it properly
when Listener.Stop is called.
2. call connWG.Add in the correct place. WaitGroup.Add must be called
before starting a goroutine, not from inside the goroutine.
3. Set metrics config EnableRuntimeMetrics to `false` so that we don't
start a background goroutine in each test for no reason. There is no
way to shutdown this goroutine, and it was an added distraction while
debugging these timeouts.
5. two tests were calling require.NoError from a goroutine.
require.NoError calls t.FailNow, which MUST be called from the main
test goroutine. Instead use t.Errorf, which can be called from other
goroutines and will still fail the test.
6. `assertCurrentGaugeValue` wass breaking out of a for loop, which
would cause the `RWMutex.RUnlock` to be missed. Fixed by calling
unlock before `break`.
The core issue of a deadlock was fixed by https://github.com/armon/go-metrics/pull/124.
* CLI: Add support for reading internal raft snapshots to snapshot inspect
* Add snapshot inspect test for raw state files
* Add changelog entry
* Update .changelog/10089.txt
* website: add back unlinked pages to match previous state
* website: add unlinked content check
* website: add hidden nav-data to unlinked content check
* ui: Add Admin Partition feature flag
This adds a `PartitionEnabled`/`CONSUL_PARTITIONS_ENABLED` feature flag
that can be set during production form the consul binary, or
additionally during development/testing via cookies.
* Add partitions bookmarklet and docs, and all eng docs from main README to the docs instead.
You probably already have the app running once you need these, and it reduces the amount of text/detail in the main README
* Add the env variable section back into the README with actual env vars
* Add inline-code CSS component
* Add %inline-code to all the places where we need it
* Inject selected env variables into the translations file
* Add ingress gateway upstream 'host header' intro text
* Make sure we can use actual correct component casing for titles but still have nice consistent menu item casing in the side nav
On a few occasions I've had to read timeout stack traces for tests and
noticed that retry.Run runs the function in a goroutine. This makes
debuging a timeout more difficult because the gourinte of the retryable
function is disconnected from the stack of the actual test. It requires
searching through the entire stack trace to find the other goroutine.
By using panic instead of runtime.Goexit() we remove the need for a
separate goroutine.
Also a few other small improvements:
* add `R.Helper` so that an assertion function can be used with both
testing.T and retry.R.
* Pass t to `Retryer.NextOr`, and call `t.Helper` in a number of places
so that the line number reported by `t.Log` is the line in the test
where `retry.Run` was called, instead of some line in `retry.go` that
is not relevant to the failure.
* improve the implementation of `dedup` by removing the need to iterate
twice. Instad track the lines and skip any duplicate when writing to
the buffer.
Previously canRetry was attempting to retrieve this error from args, however there was never
any callers that would pass an error to args.
With the change to raftApply to move this error to the error return value, it is now possible
to receive this error from the err argument.
This commit updates canRetry to check for ErrChunkingResubmit in err.
Previously we were inconsistently checking the response for errors. This
PR moves the response-is-error check into raftApply, so that all callers
can look at only the error response, instead of having to know that
errors could come from two places.
This should expose a few more errors that were previously hidden because
in some calls to raftApply we were ignoring the response return value.
Also handle errors more consistently. In some cases we would log the
error before returning it. This can be very confusing because it can
result in the same error being logged multiple times. Instead return
a wrapped error.