The previous value was too conservative and users with many instances
were having problems because of it. This change increases the limit to
8192 which reportedly fixed most of the issues with that.
Related: #4984, #4986, #5050.
This adds acl enforcement to the two endpoints that were missing it.
Note that in the case of getting a services health by its id, we still
must first lookup the service so we still "leak" information about a
service with that ID existing. There isn't really a way around it though
as ACLs are meant to check service names.
* Updates to the Txn API for namespaces
* Update agent/consul/txn_endpoint.go
Co-Authored-By: R.B. Boyer <rb@hashicorp.com>
Co-authored-by: R.B. Boyer <public@richardboyer.net>
The backing RPC already existed but the endpoint will be useful for other service syncing processes such as consul-k8s as this endpoint can return all services registered with a node regardless of namespacing.
* Fix segfault when removing both a service and associated check
updateSyncState creates entries in the services and checks maps for
remote services/checks that are not found locally, so that we can then
make sure to delete them in our reconciliation process. However, the
values added to the map are missing key fields that the rest of the code
expects to not be nil.
* Add comment stating Check field can be nil
Something similar already happens inside of the server
(agent/consul/server.go) but by doing it in the general config parsing
for the agent we can have agent-level code rely on the PrimaryDatacenter
field, too.
* Increase raft notify buffer.
Fixes https://github.com/hashicorp/consul/issues/6852.
Increasing the buffer helps recovering from leader flapping. It lowers
the chances of the flapping leader to get into a deadlock situation like
described in #6852.
- Explicitly wait to start the test until the initial AE sync of the node.
- Run the blocking query in the main goroutine to cut down on possible
poor goroutine scheduling issues being to blame for delays.
- If the blocking query is woken up with no index change, rerun the
query. This may happen if the CI server is loaded and time dilation is
happening.
Currently when using the built-in CA provider for Connect, root certificates are valid for 10 years, however secondary DCs get intermediates that are valid for only 1 year. There is no mechanism currently short of rotating the root in the primary that will cause the secondary DCs to renew their intermediates.
This PR adds a check that renews the cert if it is half way through its validity period.
In order to be able to test these changes, a new configuration option was added: IntermediateCertTTL which is set extremely low in the tests.
* Add CreateCSRWithSAN
* Use CreateCSRWithSAN in auto_encrypt and cache
* Copy DNSNames and IPAddresses to cert
* Verify auto_encrypt.sign returns cert with SAN
* provide configuration options for auto_encrypt dnssan and ipsan
* rename CreateCSRWithSAN to CreateCSR
* Use consts for well known tagged adress keys
* Add ipv4 and ipv6 tagged addresses for node lan and wan
* Add ipv4 and ipv6 tagged addresses for service lan and wan
* Use IPv4 and IPv6 address in DNS
Deregistering a service from the catalog automatically deregisters its
checks, however the agent still performs a deregister call for each
service checks even after the service has been deregistered.
With ACLs enabled this results in logs like:
"message:consul: "Catalog.Deregister" RPC failed to server
server_ip:8300: rpc error making call: rpc error making call: Unknown
check 'check_id'"
This change removes associated checks from the agent state when
deregistering a service, which results in less calls to the servers and
supresses the error logs.
* Renamed structs.IntentionWildcard to structs.WildcardSpecifier
* Refactor ACL Config
Get rid of remnants of enterprise only renaming.
Add a WildcardName field for specifying what string should be used to indicate a wildcard.
* Add wildcard support in the ACL package
For read operations they can call anyAllowed to determine if any read access to the given resource would be granted.
For write operations they can call allAllowed to ensure that write access is granted to everything.
* Make v1/agent/connect/authorize namespace aware
* Update intention ACL enforcement
This also changes how intention:read is granted. Before the Intention.List RPC would allow viewing an intention if the token had intention:read on the destination. However Intention.Match allowed viewing if access was allowed for either the source or dest side. Now Intention.List and Intention.Get fall in line with Intention.Matches previous behavior.
Due to this being done a few different places ACL enforcement for a singular intention is now done with the CanRead and CanWrite methods on the intention itself.
* Refactor Intention.Apply to make things easier to follow.
Sometimes, we have lots of errors in cross calls between DCs (several hundreds / sec)
Enrich the log in order to help diagnose the root cause of issue.
Before we were issuing 1 watch for every service in the services listing which would have caused the agent to process many more identical events simultaneously.
Restore a few more service-kind index updates so blocking in ServiceDump works in more cases
Namely one omission was that check updates for dumped services were not
unblocking.
Also adds a ServiceDump state store test and also fix a watch bug with the
normal dump.
Follow-on from #6916
• Renamed EnterpriseACLConfig to just Config
• Removed chained_authorizer_oss.go as it was empty
• Renamed acl.go to errors.go to more closely describe its contents