93 Commits

Author SHA1 Message Date
Paul Banks
bb985743e9 cache: Fix bug where connection errors can cause early cache expiry (#9979)
Fixes a cache bug where TTL is not updated while a value isn't changing or cache entry is returning fetch errors.
2021-04-08 10:11:46 +00:00
Paul Banks
78c1528c48 cache: fix bug where TTLs were ignored leading to leaked memory in client agents (#9978)
* Fix bug in cache where TTLs are effectively ignored

This mostly affects streaming since streaming will immediately return from Fetch calls when the state is Closed on eviction which causes the race condition every time.

However this also affects all other cache types if the fetch call happens to return between the eviction and then next time around the Get loop by any client.

There is a separate bug that allows cache items to be evicted even when there are active clients which is the trigger here.

* Add changelog entry

* Update .changelog/9978.txt
2021-04-08 10:09:29 +00:00
Daniel Nephin
4c2a861dda Merge pull request #9763 from hashicorp/dnephin/cache-warn-on-error-in-notify
cache: log a warning when Cache.Notify handles an error
2021-02-19 23:31:08 +00:00
Matt Keeler
975c196f7c Stop background refresh of cached data for requests that result in ACL not found errors (#9738) 2021-02-09 15:16:35 +00:00
Kit Patella
727780140e Merge pull request #9261 from hashicorp/telemetry/fix-missing-and-stale-docs-2
Telemetry/fix missing and stale docs
2020-11-23 21:34:59 +00:00
Kit Patella
82e7363b90 Merge pull request #9198 from hashicorp/mkcp/telemetry/add-all-metric-definitions
Add metric definitions for all metrics known at Consul start
2020-11-17 00:13:51 +00:00
Daniel Nephin
0d4fa882b3 lib/ttlcache: unexport key and additional godoc 2020-10-20 19:16:03 -04:00
Daniel Nephin
c17baadbf8 lib/ttlcache: add a constant for NotIndexed 2020-10-20 19:10:20 -04:00
Daniel Nephin
6c09ab3dd8 cache: fix a bug with Prepopulate
Prepopulate was setting entry.Expiry.HeapIndex to 0. Previously this would result in a call to heap.Fix(0)
which wasn't correct, but was also not really a problem because at worse it would re-notify.

With the recent change to extract cachettl it was changed to call Update(idx), which would have updated
the wrong entry.

A previous commit removed the setting of entry.Expiry so that the HeapIndex would be reported
as -1, and this commit adds a test and handles the -1 heap index.
2020-10-20 19:10:20 -04:00
Daniel Nephin
bbb816aa8a lib/ttlcache: extract package from agent/cache 2020-10-20 19:10:20 -04:00
Daniel Nephin
c4122edd22 cache: export ExpiryHeap
and hide internal methods on an unexported type, so that when it is extrated those methods are not exported.
2020-10-20 19:10:20 -04:00
Daniel Nephin
343d133183 cache: Refactor heap.notify to make it more explicit.
And remove duplicate notifications.

Instead of performing the check in the heap implementation, check the
index in the higher level interface (Add,Remove,Update) and notify if one
of the relevant indexes is 0.
2020-10-20 19:10:20 -04:00
Daniel Nephin
499f2822cf cache: Move more of the expiryLoop into the Heap 2020-10-20 19:10:20 -04:00
Daniel Nephin
2cdc90e01b cache: extract cache eviction heap
Start creating an interface that doesn't require using heap and hides more of the
entry internals.
2020-10-20 19:10:19 -04:00
Daniel Nephin
3fa08beecf submatview: add a test for handling of NewSnapshotToFollow
Also add some godoc
Rename some vars and functions
Fix a data race in the new cache test for entry closing.
2020-10-06 13:22:02 -04:00
Daniel Nephin
b576a2d3c7 cache-types: Update Streaming health cache-type
To use latest protobuf types
2020-10-06 13:22:02 -04:00
Daniel Nephin
132b76acef agent/cache: Add cache-type and materialized view for streaming health
Extracted from d97412ce4c399a35b41bbdae2716f0e32dce80bf

Co-authored-by: Paul Banks <banks@banksco.de>
2020-10-06 13:21:57 -04:00
Daniel Nephin
6956477be5
Merge pull request #8548 from edevil/fix_flake
Fix flaky TestACLResolver_Client/Concurrent-Token-Resolve
2020-08-28 15:10:55 -04:00
Pierre Souchay
d5974b1d17 Added Unit test for cache reloading 2020-08-28 13:03:58 +02:00
Pierre Souchay
879d087f65 Added options.Equals() and minor fixes indentation fixes 2020-08-27 13:44:45 +02:00
Pierre Souchay
d2be9d38da Ensure that Cache options are reloaded when consul reload is performed.
This will apply cache throttling parameters are properly applied:
 * cache.EntryFetchMaxBurst
 * cache.EntryFetchRate

When values are updated, a log is displayed in info.
2020-08-24 23:33:10 +02:00
André Cruz
9a0792139c
Decrease test flakiness
Fix flaky TestACLResolver_Client/Concurrent-Token-Resolve and TestCacheNotifyPolling
2020-08-24 20:30:02 +01:00
Hans Hasselberg
054595b1f8
agent/cache test for cache throttling. (#8396) 2020-07-30 14:41:13 +02:00
Matt Keeler
be01c4241d
Default Cache rate limiting options in New
Also get rid of the TestCache helper which was where these defaults were happening previously.
2020-07-28 12:34:35 -04:00
Pierre Souchay
505de6dc29
Added ratelimit to handle throtling cache (#8226)
This implements a solution for #7863

It does:

    Add a new config cache.entry_fetch_rate to limit the number of calls/s for a given cache entry, default value = rate.Inf
    Add cache.entry_fetch_max_burst size of rate limit (default value = 2)

The new configuration now supports the following syntax for instance to allow 1 query every 3s:

    command line HCL: -hcl 'cache = { entry_fetch_rate = 0.333}'
    in JSON

{
  "cache": {
    "entry_fetch_rate": 0.333
  }
}
2020-07-27 23:11:11 +02:00
Matt Keeler
12acdd7481
Disable background cache refresh for Connect Leaf Certs
The rationale behind removing them is that all of our own code (xDS, builtin connect proxy) use the cache notification mechanism. This ensures that the blocking fetch behind the scenes is always executing. Therefore the only way you might go to get a certificate and have to wait is when 1) the request has never been made for that cert before or 2) you are using the v1/agent/connect/ca/leaf API for retrieving the cert yourself.

In the first case, the refresh change doesn’t alter the behavior. In the second case, it can be mitigated by using blocking queries with that API which just like normal cache notification mechanism will cause the blocking fetch to be initiated and to get leaf certs as soon as needed.

If you are not using blocking queries, or Envoy/xDS, or the builtin connect proxy but are retrieving the certs yourself then the HTTP endpoint might take a little longer to respond.

This also renames the RefreshTimeout field on the register options to QueryTimeout to more accurately reflect that it is used for any type that supports blocking queries.
2020-07-21 12:19:25 -04:00
Daniel Nephin
2ec3760b70 agent/cache: Use AllowNotModifiedResponse in CatalogListServices
Co-authored-by: Pierre Souchay <pierresouchay@users.noreply.github.com>
2020-07-14 18:58:20 -04:00
Daniel Nephin
dbb8d14728 agent/cache: Update some docstrings 2020-07-14 18:58:20 -04:00
Matt Keeler
8837907de4
Make the Agent Cache more Context aware (#8092)
Blocking queries issues will still be uncancellable (that cannot be helped until we get rid of net/rpc). However this makes it so that if calling getWithIndex (like during a cache Notify go routine) we can cancell the outer routine. Previously it would keep issuing more blocking queries until the result state actually changed.
2020-06-15 11:01:25 -04:00
Daniel Nephin
81755c860a agent/cache: remove error return from fetch
A previous change removed the only error, so the return value can be
removed now.
2020-04-17 11:55:01 -04:00
Daniel Nephin
4ef9fc9f27 agent/cache: reduce function arguments by removing duplicates
A few of the unexported functions in agent/cache took a large number of
arguments. These arguments were effectively overrides for values that
were provided in RequestInfo.

By using a struct we can not only reduce the number of arguments, but
also simplify the logic by removing the need for overrides.
2020-04-17 11:35:07 -04:00
Daniel Nephin
5fe7043439 agent/cache: Make all cache options RegisterOptions
Previously the SupportsBlocking option was specified by a method on the
type, and all the other options were specified from RegisterOptions.

This change moves RegisterOptions to a method on the type, and moves
SupportsBlocking into the options struct.

Currently there are only 2 cache-types. So all cache-types can implement
this method by embedding a struct with those predefined values. In the
future if a cache type needs to be registered more than once with different
options it can remove the embedded type and implement the method in a way
that allows for paramaterization.
2020-04-16 18:56:34 -04:00
Daniel Nephin
89f41bddfe Remove TTL from cacheEntryExpiry
This should very slightly reduce the amount of memory required to store each item in
the cache.

It will also enable setting different TTLs based on the type of result. For example
we may want to use a shorter TTL when the result indicates the resource does not exist,
as storing these types of records could easily lead to a DOS caused by
OOM.
2020-04-13 13:10:38 -04:00
Daniel Nephin
7246d8b6cb agent/cache: Reduce differences between notify implementations
These two notify functions are very similar. There appear to be just
enough differences that trying to parameterize the differences may not
improve things.

For now, reduce some of the cosmetic differences so that the material
differences are more obvious.
2020-04-13 13:10:38 -04:00
Daniel Nephin
66fbb13976 agent/cache: Inline the refresh function to make recursion more obvious
fetch is already an exceptionally long function, but hiding the
recrusion in a function call likely does not help.
2020-04-13 13:10:38 -04:00
Daniel Nephin
faeaed5d0c agent/cache: Make the return values of getEntryLocked more obvious
Use named returned so that the caller has a better idea of what these
bools mean.

Return early to reduce the scope, and make it more obvious what values
are returned in which cases. Also reduces the number of conditional
expressions in each case.
2020-04-13 13:10:38 -04:00
Daniel Nephin
e9e45545dd agent/cache: Small formatting improvements to improve readability
Remove Cache.entryKey which called a single function.
Format multiline struct creation one field per line.
2020-04-13 12:34:11 -04:00
Daniel Nephin
1f25bf88b8
Merge pull request #7596 from hashicorp/dnephin/agent-cache-type-entry
agent/cache: move typeEntry lookup to the edge
2020-04-13 12:24:07 -04:00
Daniel Nephin
c9a87be6ee agent/cache: move typeEntry lookup to the edge
This change moves all the typeEntry lookups to the first step in the exported methods,
and makes unexporter internals accept the typeEntry struct.

This change is primarily intended to make it easier to extract the container of caches
from the Cache type.

It may incidentally reduce locking in fetch, but that was not a goal.
2020-04-03 16:01:56 -04:00
Pierre Souchay
09e638a9c6
tests: more tolerance to latency for unstable test TestCacheNotifyPolling(). (#7574) 2020-04-03 10:29:38 +02:00
R.B. Boyer
12876983cf
avoid 'panic: Log in goroutine after TestCacheGet_refreshAge has completed' (#7276) 2020-02-12 10:01:51 -06:00
Anthony Scalisi
beb928f8de fix spelling errors (#7135) 2020-01-27 07:00:33 -06:00
R.B. Boyer
9566df524e
agent: cache notifications work after error if the underlying RPC returns index=1 (#6547)
Fixes #6521

Ensure that initial failures to fetch an agent cache entry using the
notify API where the underlying RPC returns a synthetic index of 1
correctly recovers when those RPCs resume working.

The bug in the Cache.notifyBlockingQuery used to incorrectly "fix" the
index for the next query from 0 to 1 for all queries, when it should
have not done so for queries that errored.

Also fixed some things that made debugging difficult:

- config entry read/list endpoints send back QueryMeta headers
- xds event loops don't swallow the cache notification errors
2019-09-26 10:42:17 -05:00
Christian Muehlhaeuser
7753b97cc7 Simplified code in various places (#6176)
All these changes should have no side-effects or change behavior:

- Use bytes.Buffer's String() instead of a conversion
- Use time.Since and time.Until where fitting
- Drop unnecessary returns and assignment
2019-07-20 09:37:19 -04:00
Freddy
5873c56a03
Flaky test overhaul (#6100) 2019-07-12 09:52:26 -06:00
Hans Hasselberg
33a7df3330
tls: auto_encrypt enables automatic RPC cert provisioning for consul clients (#5597) 2019-06-27 22:22:07 +02:00
Matt Keeler
dbc48ea3f7 Fixes race condition in Agent Cache (#5796)
* Fix race condition during a cache get

Check the entry we pulled out of the cache while holding the lock had Fetching set.
If it did then we should use the existing Waiter instead of calling fetch. The reason
this is better than just calling fetch is that fetch re-gets the entry out of the
entries map and the previous fetch may have finished. Therefore this prevents
erroneously starting a new fetch because we just missed the last update.

* Fix race condition fully

The first commit still allowed for the following scenario:

• No entry existing when checked in getWithIndex while holding the read lock
• Then by time we had reached fetch it had been created and finished.

* always use ok when returning

* comment mentioning the reading from entries.

* use cacheHit consistently
2019-05-07 11:15:49 +01:00
Kyle Havlovitz
43bfc20dc8 Test an index=0 value in cache.Notify 2019-04-25 02:11:07 -07:00
Kyle Havlovitz
c269369760 Make central service config opt-in and rework the initial registration 2019-04-24 06:11:08 -07:00
R.B. Boyer
f4a3b9d518
fix typos reported by golangci-lint:misspell (#5434) 2019-03-06 11:13:28 -06:00