consul/agent
Matt Keeler 3c1e17cbd5
Fix flaky tests in the agent/grpc/public/services/serverdiscovery package (#13173)
Occasionally we had seen the TestWatchServers_ACLToken_PermissionDenied be flagged as flaky in circleci. This change should fix that.

Why it fixes it is complicated. The test was failing with a panic when a mocked ACL Resolver was being called more times than expected. I struggled for a while to determine how that could be. This test should call authorize once and only once and the error returned should cause the stream to be terminated and the error returned to the gRPC client. Another oddity was no amount of running this test locally seemed to be able to reproduce the issue. I ran the test hundreds of thousands of time and it always passed.

It turns out that there is nothing wrong with the test. It just so happens that the panic from unexpected invocation of a mocked call happened during the test but was caused by a previous test (specifically the TestWatchServers_StreamLifecycle test)

The stream from the previous test remained open after all the test Cleanup functions were run and it just so happened that when the EventPublisher eventually picked up that the context was cancelled during cleanup, it force closes all subscriptions which causes some loops to be re-entered and the streams to be reauthorized. Its that looping in response to forced subscription closures that causes the mock to eventually panic. All the components, publisher, server, client all operate based on contexts. We cancel all those contexts but there is no syncrhonous way to know when they are stopped.

We could have implemented a syncrhonous stop but in the context of an actual running Consul, context cancellation + async stopping is perfectly fine. What we (Dan and I) eventually thought was that the behavior of grpc streams such as this when a server was shutting down wasn’t super helpful. What we would want is for a client to be able to distinguish between subscription closed because something may have changed requiring re-authentication and subscription closed because the server is shutting down. That way we can send back appropriate error messages to detail that the server is shutting down and not confuse users with potentially needing to resubscribe.

So thats what this PR does. We have introduced a shutting down state to our event subscriptions and the various streaming gRPC services that rely on the event publisher will all just behave correctly and actually stop the stream (not attempt transparent reauthorization) if this particular error is the one we get from the stream. Additionally the error that gets transmitted back through gRPC when this does occur indicates to the consumer that the server is going away. That is more helpful so that a client can then attempt to reconnect to another server.
2022-05-23 08:59:13 -04:00
..
ae sdk: add TestLogLevel for setting log level in tests 2022-02-03 13:42:28 -05:00
auto-config peering: initial sync (#12842) 2022-04-21 17:34:40 -05:00
cache proxycfg: remove dependency on `cache.UpdateEvent` (#13144) 2022-05-20 15:47:40 +01:00
cache-types Watch the singular service resolver instead of the list + filtering to 1 (#13012) 2022-05-12 16:34:17 -04:00
checks Merge pull request #12685 from hashicorp/http-check-redirect-option 2022-04-07 11:29:27 -07:00
config Retry on bad dogstatsd connection (#13091) 2022-05-19 16:03:46 -04:00
configentry Fixup acl.EnterpriseMeta 2022-04-05 15:11:49 -07:00
connect Support vault namespaces in connect CA (#12904) 2022-05-04 19:41:55 -07:00
consul Fix flaky tests in the agent/grpc/public/services/serverdiscovery package (#13173) 2022-05-23 08:59:13 -04:00
debug bulk rewrite using this script 2022-01-20 10:46:23 -06:00
dns test: fix incorrect use of t instead of r in retry test (#13146) 2022-05-19 14:00:07 -05:00
exec re-run gofmt on 1.17 (#11579) 2021-11-16 12:04:01 -06:00
grpc remove remaining shim runStep functions (#13015) 2022-05-10 16:24:45 -05:00
local Fixup acl.EnterpriseMeta 2022-04-05 15:11:49 -07:00
metadata partitions: various refactors to support partitioning the serf LAN pool (#11568) 2021-11-15 09:51:14 -06:00
mock
pool Add timeout to Client RPC calls (#11500) 2022-04-21 16:21:35 -04:00
proxycfg proxycfg: remove dependency on `cache.UpdateEvent` (#13144) 2022-05-20 15:47:40 +01:00
router sdk: add TestLogLevel for setting log level in tests 2022-02-03 13:42:28 -05:00
routine-leak-checker Remove references to "master" ACL tokens in tests (#11751) 2021-12-07 12:48:50 +00:00
rpc peering: accept replication stream of discovery chain information at the importing side (#13151) 2022-05-19 16:37:52 -05:00
rpcclient/health agent: allow for service discovery queries involving peer name to use streaming (#13168) 2022-05-20 15:27:01 -05:00
structs peering: accept replication stream of discovery chain information at the importing side (#13151) 2022-05-19 16:37:52 -05:00
submatview proxycfg: remove dependency on `cache.UpdateEvent` (#13144) 2022-05-20 15:47:40 +01:00
systemd
token agent/token: rename `agent_master` to `agent_recovery` (internally) (#11744) 2021-12-07 12:12:47 +00:00
uiserver Add an internal env var for managed cluster config in the ui (#12796) 2022-04-15 09:55:52 -07:00
xds proxycfg: remove dependency on `cache.UpdateEvent` (#13144) 2022-05-20 15:47:40 +01:00
acl.go Unify various status errors into one HTTP error type. (#12594) 2022-04-29 13:42:49 -04:00
acl_endpoint.go Unify various status errors into one HTTP error type. (#12594) 2022-04-29 13:42:49 -04:00
acl_endpoint_legacy.go
acl_endpoint_legacy_test.go agent: Ensure partition is considered in agent endpoints (#11427) 2021-10-26 15:20:57 -04:00
acl_endpoint_test.go Unify various status errors into one HTTP error type. (#12594) 2022-04-29 13:42:49 -04:00
acl_oss.go agent: support `X-Consul-Results-Filtered-By-ACLs` header in agent-local endpoints (#11610) 2021-12-03 20:36:28 +00:00
acl_test.go Retry on bad dogstatsd connection (#13091) 2022-05-19 16:03:46 -04:00
agent.go proxycfg: remove dependency on `cache.UpdateEvent` (#13144) 2022-05-20 15:47:40 +01:00
agent_endpoint.go Retry on bad dogstatsd connection (#13091) 2022-05-19 16:03:46 -04:00
agent_endpoint_oss.go Fixup acl.EnterpriseMeta 2022-04-05 15:11:49 -07:00
agent_endpoint_oss_test.go Add oss test 2022-05-09 10:07:19 -07:00
agent_endpoint_test.go Retry on bad dogstatsd connection (#13091) 2022-05-19 16:03:46 -04:00
agent_oss.go Fixup acl.EnterpriseMeta 2022-04-05 15:11:49 -07:00
agent_test.go Update go version to 1.18.1 2022-04-18 11:41:10 -04:00
apiserver.go
apiserver_test.go
catalog_endpoint.go Unify various status errors into one HTTP error type. (#12594) 2022-04-29 13:42:49 -04:00
catalog_endpoint_oss.go re-run gofmt on 1.17 (#11579) 2021-11-16 12:04:01 -06:00
catalog_endpoint_test.go catalog: compare node names case insensitively in more places (#12444) 2022-02-24 16:54:47 -06:00
check.go Fixup acl.EnterpriseMeta 2022-04-05 15:11:49 -07:00
config_endpoint.go Unify various status errors into one HTTP error type. (#12594) 2022-04-29 13:42:49 -04:00
config_endpoint_test.go Unify various status errors into one HTTP error type. (#12594) 2022-04-29 13:42:49 -04:00
connect_auth.go Unify various status errors into one HTTP error type. (#12594) 2022-04-29 13:42:49 -04:00
connect_ca_endpoint.go Unify various status errors into one HTTP error type. (#12594) 2022-04-29 13:42:49 -04:00
connect_ca_endpoint_test.go Update go version to 1.18.1 2022-04-18 11:41:10 -04:00
coordinate_endpoint.go Unify various status errors into one HTTP error type. (#12594) 2022-04-29 13:42:49 -04:00
coordinate_endpoint_test.go Unify various status errors into one HTTP error type. (#12594) 2022-04-29 13:42:49 -04:00
delegate_mock_test.go Fixup acl.EnterpriseMeta 2022-04-05 15:11:49 -07:00
denylist.go
denylist_test.go
discovery_chain_endpoint.go Unify various status errors into one HTTP error type. (#12594) 2022-04-29 13:42:49 -04:00
discovery_chain_endpoint_test.go Unify various status errors into one HTTP error type. (#12594) 2022-04-29 13:42:49 -04:00
dns.go Fixup acl.EnterpriseMeta 2022-04-05 15:11:49 -07:00
dns_oss.go Fixup acl.EnterpriseMeta 2022-04-05 15:11:49 -07:00
dns_test.go Remove references to "master" ACL tokens in tests (#11751) 2021-12-07 12:48:50 +00:00
enterprise_delegate_oss.go re-run gofmt on 1.17 (#11579) 2021-11-16 12:04:01 -06:00
event_endpoint.go Unify various status errors into one HTTP error type. (#12594) 2022-04-29 13:42:49 -04:00
event_endpoint_test.go Unify various status errors into one HTTP error type. (#12594) 2022-04-29 13:42:49 -04:00
federation_state_endpoint.go Unify various status errors into one HTTP error type. (#12594) 2022-04-29 13:42:49 -04:00
health_endpoint.go Unify various status errors into one HTTP error type. (#12594) 2022-04-29 13:42:49 -04:00
health_endpoint_test.go agent: allow for service discovery queries involving peer name to use streaming (#13168) 2022-05-20 15:27:01 -05:00
http.go docs: clarify consistency mode operation 2022-05-09 16:39:48 -07:00
http_decode_test.go
http_oss.go Unify various status errors into one HTTP error type. (#12594) 2022-04-29 13:42:49 -04:00
http_oss_test.go Remove references to "master" ACL tokens in tests (#11751) 2021-12-07 12:48:50 +00:00
http_register.go [sync oss] api: add peering api module (#12911) 2022-05-02 11:49:05 -07:00
http_test.go Unify various status errors into one HTTP error type. (#12594) 2022-04-29 13:42:49 -04:00
intentions_endpoint.go Unify various status errors into one HTTP error type. (#12594) 2022-04-29 13:42:49 -04:00
intentions_endpoint_oss_test.go re-run gofmt on 1.17 (#11579) 2021-11-16 12:04:01 -06:00
intentions_endpoint_test.go Unify various status errors into one HTTP error type. (#12594) 2022-04-29 13:42:49 -04:00
keyring.go Allows keyring operations on client agents 2022-02-24 17:24:57 +00:00
keyring_test.go Remove references to "master" ACL tokens in tests (#11751) 2021-12-07 12:48:50 +00:00
kvs_endpoint.go Unify various status errors into one HTTP error type. (#12594) 2022-04-29 13:42:49 -04:00
kvs_endpoint_test.go
metrics.go agent: move agent tls metric monitor to a more appropriate place 2021-10-27 16:26:09 -04:00
metrics_test.go add more labels to RequestRecorder (#12727) 2022-04-12 10:50:25 -07:00
nodeid.go
nodeid_test.go
notify.go
notify_test.go
operator_endpoint.go Unify various status errors into one HTTP error type. (#12594) 2022-04-29 13:42:49 -04:00
operator_endpoint_oss.go re-run gofmt on 1.17 (#11579) 2021-11-16 12:04:01 -06:00
operator_endpoint_test.go
peering_endpoint.go peering: accept replication stream of discovery chain information at the importing side (#13151) 2022-05-19 16:37:52 -05:00
peering_endpoint_oss_test.go [sync oss] api: add peering api module (#12911) 2022-05-02 11:49:05 -07:00
peering_endpoint_test.go add err msg on PeeringRead not found (#12986) 2022-05-09 15:22:42 -07:00
prepared_query_endpoint.go Unify various status errors into one HTTP error type. (#12594) 2022-04-29 13:42:49 -04:00
prepared_query_endpoint_test.go Unify various status errors into one HTTP error type. (#12594) 2022-04-29 13:42:49 -04:00
reload.go
remote_exec.go
remote_exec_test.go Remove references to "master" ACL tokens in tests (#11751) 2021-12-07 12:48:50 +00:00
retry_join.go agent: refactor the agent delegate interface to be partition friendly (#11429) 2021-10-26 15:08:55 -05:00
retry_join_test.go
service_checks_test.go
service_manager.go
service_manager_test.go bulk rewrite using this script 2022-01-20 10:46:23 -06:00
session_endpoint.go Unify various status errors into one HTTP error type. (#12594) 2022-04-29 13:42:49 -04:00
session_endpoint_test.go
setup.go Retry on bad dogstatsd connection (#13091) 2022-05-19 16:03:46 -04:00
setup_oss.go re-run gofmt on 1.17 (#11579) 2021-11-16 12:04:01 -06:00
sidecar_service.go
sidecar_service_test.go bulk rewrite using this script 2022-01-20 10:46:23 -06:00
signal_unix.go re-run gofmt on 1.17 (#11579) 2021-11-16 12:04:01 -06:00
signal_windows.go re-run gofmt on 1.17 (#11579) 2021-11-16 12:04:01 -06:00
snapshot_endpoint.go
snapshot_endpoint_test.go
status_endpoint.go
status_endpoint_test.go
streaming_test.go regenerate expired certs (#11462) 2021-11-01 11:40:16 -04:00
testagent.go Retry on bad dogstatsd connection (#13091) 2022-05-19 16:03:46 -04:00
testagent_test.go
translate_addr.go
txn_endpoint.go Unify various status errors into one HTTP error type. (#12594) 2022-04-29 13:42:49 -04:00
txn_endpoint_test.go Unify various status errors into one HTTP error type. (#12594) 2022-04-29 13:42:49 -04:00
ui_endpoint.go When a host header is defined override `req.Host` in the metrics ui (#13071) 2022-05-13 14:05:22 -04:00
ui_endpoint_oss_test.go re-run gofmt on 1.17 (#11579) 2021-11-16 12:04:01 -06:00
ui_endpoint_test.go Support for connect native services in topology view. (#12098) 2022-02-16 16:51:54 -05:00
user_event.go Vendor in rpc mono repo for net/rpc fork, go-msgpack, msgpackrpc. (#12311) 2022-02-14 09:45:45 -08:00
user_event_test.go
util.go Remove some usage of md5 from the system (#11491) 2021-11-04 13:07:54 -07:00
util_test.go Remove some usage of md5 from the system (#11491) 2021-11-04 13:07:54 -07:00
watch_handler.go
watch_handler_test.go