When our peer deletes the peering it is locally marked as terminated.
This termination should kick off deleting all imported data, but should
not delete the peering object itself.
Keeping peerings marked as terminated acts as a signal that the action
took place.
Once a peering is marked for deletion a new leader routine will now
clean up all imported resources and then the peering itself.
A lot of the logic was grabbed from the namespace/partitions deferred
deletions but with a handful of simplifications:
- The rate limiting is not configurable.
- Deleting imported nodes/services/checks is done by deleting nodes with
the Txn API. The services and checks are deleted as a side-effect.
- There is no "round rate limiter" like with namespaces and partitions.
This is because peerings are purely local, and deleting a peering in
the datacenter does not depend on deleting data from other DCs like
with WAN-federated namespaces. All rate limiting is handled by the
Raft rate limiter.
1. Fix a bug where the peering leader routine would not track all active
peerings in the "stored" reconciliation map. This could lead to
tearing down streams where the token was generated, since the
ConnectedStreams() method used for reconciliation returns all streams
and not just the ones initiated by this leader routine.
2. Fix a race where stream contexts were being canceled before
termination messages were being processed by a peer.
Previously the leader routine would tear down streams by canceling
their context right after the termination message was sent. This
context cancelation could be propagated to the server side faster
than the termination message. Now there is a change where the
dialing peer uses CloseSend() to signal when no more messages will
be sent. Eventually the server peer will read an EOF after receiving
and processing the preceding termination message.
Using CloseSend() is actually not enough to address the issue
mentioned, since it doesn't wait for the server peer to finish
processing messages. Because of this now the dialing peer also reads
from the stream until an error signals that there are no more
messages. Receiving an EOF from our peer indicates that they
processed the termination message and have no additional work to do.
Given that the stream is being closed, all the messages received by
Recv are discarded. We only check for errors to avoid importing new
data.
When deleting a peering we do not want to delete the peering and all
imported data in a single operation, since deleting a large amount of
data at once could overload Consul.
Instead we defer deletion of peerings so that:
1. When a peering deletion request is received via gRPC the peering is
marked for deletion by setting the DeletedAt field.
2. A leader routine will monitor for peerings that are marked for
deletion and kick off a throttled deletion of all imported resources
before deleting the peering itself.
This commit mostly addresses point #1 by modifying the peering service
to mark peerings for deletion. Another key change is to add a
PeeringListDeleted state store function which can return all peerings
marked for deletion. This function is what will be watched by the
deferred deletion leader routine.
Previously, imported data would never be deleted. As
nodes/services/checks were registered and deregistered, resources
deleted from the exporting cluster would accumulate in the imported
cluster.
This commit makes updates to replication so that whenever an update is
received for a service name we reconcile what was present in the catalog
against what was received.
This handleUpdateService method can handle both updates and deletions.
When converting from Consul intentions to xds RBAC rules, services imported from other peers must encode additional data like partition (from the remote cluster) and trust domain.
This PR updates the PeeringTrustBundle to hold the sending side's local partition as ExportedPartition. It also updates RBAC code to encode SpiffeIDs of imported services with the ExportedPartition and TrustDomain.
Mesh gateways can use hostnames in their tagged addresses (#7999). This is useful
if you were to expose a mesh gateway using a cloud networking load balancer appliance
that gives you a DNS name but no reliable static IPs.
Envoy cannot accept hostnames via EDS and those must be configured using CDS.
There was already logic when configuring gateways in other locations in the code, but
given the illusions in play for peering the downstream of a peered service wasn't aware
that it should be doing that.
Also:
- ensuring that we always try to use wan-like addresses to cross peer boundaries.
Require use of mesh gateways in order for service mesh data plane
traffic to flow between peers.
This also adds plumbing for envoy integration tests involving peers, and
one starter peering test.
Upgrade ember-composable-helpers to version 5.x. This version contains the pick-helper which makes composition in the template layer easier with Octane.
{{!-- this is usually hard to do with Octane --}}
<input {{on "input" (pick "target.value" this.updateText)}} .../>
Version 5.x also fixes a regression with sort-by that according to @johncowen was the reason why the version was pinned to 4.0.0 at the moment.
Version 5 of ember-composable-helpers removes the contains-helper in favor of includes which I changed all occurences for.
* when enterprise meta are wildcard assume it's a service intention
* fix partition and namespace
* move kind outside the loops
* get the kind check outside the loop and add a comment
Co-authored-by: github-team-consul-core <github-team-consul-core@hashicorp.com>
Reported in #12288
The initial test reported was ported and accurately reproduced the issue.
However, since it is a test of an upstream library's internal behavior it won't
be codified in our test suite. Refer to the ticket/PR for details on how to
demonstrate the behavior.
* update gateway-services table with endpoints
* fix failing test
* remove unneeded config in test
* rename "endpoint" to "destination"
* more endpoint renaming to destination in tests
* update isDestination based on service-defaults config entry creation
* use a 3 state kind to be able to set the kind to unknown (when neither a service or a destination exist)
* set unknown state to empty to avoid modifying alot of tests
* fix logic to set the kind correctly on CRUD
* fix failing tests
* add missing tests and fix service delete
* fix failing test
* Apply suggestions from code review
Co-authored-by: Dan Stough <dan.stough@hashicorp.com>
* fix a bug with kind and add relevant test
* fix compile error
* fix failing tests
* add kind to clone
* fix failing tests
* fix failing tests in catalog endpoint
* fix service dump test
* Apply suggestions from code review
Co-authored-by: Dan Stough <dan.stough@hashicorp.com>
* remove duplicate tests
* first draft of destinations intention in connect proxy
* remove ServiceDestinationList
* fix failing tests
* fix agent/consul failing tests
* change to filter intentions in the state store instead of adding a field.
* fix failing tests
* fix comment
* fix comments
* store service kind destination and add relevant tests
* changes based on review
* filter on destinations when querying source match
* change state store API to get an IntentionTarget parameter
* add intentions tests
* add destination upstream endpoint
* fix failing test
* fix failing test and a bug with wildcard intentions
* fix failing test
* Apply suggestions from code review
Co-authored-by: alex <8968914+acpana@users.noreply.github.com>
* add missing test and clarify doc
* fix style
* gofmt intention.go
* fix merge introduced issue
Co-authored-by: Dan Stough <dan.stough@hashicorp.com>
Co-authored-by: alex <8968914+acpana@users.noreply.github.com>
Co-authored-by: github-team-consul-core <github-team-consul-core@hashicorp.com>
* update gateway-services table with endpoints
* fix failing test
* remove unneeded config in test
* rename "endpoint" to "destination"
* more endpoint renaming to destination in tests
* update isDestination based on service-defaults config entry creation
* use a 3 state kind to be able to set the kind to unknown (when neither a service or a destination exist)
* set unknown state to empty to avoid modifying alot of tests
* fix logic to set the kind correctly on CRUD
* fix failing tests
* add missing tests and fix service delete
* fix failing test
* Apply suggestions from code review
Co-authored-by: Dan Stough <dan.stough@hashicorp.com>
* fix a bug with kind and add relevant test
* fix compile error
* fix failing tests
* add kind to clone
* fix failing tests
* fix failing tests in catalog endpoint
* fix service dump test
* Apply suggestions from code review
Co-authored-by: Dan Stough <dan.stough@hashicorp.com>
* remove duplicate tests
* first draft of destinations intention in connect proxy
* remove ServiceDestinationList
* fix failing tests
* fix agent/consul failing tests
* change to filter intentions in the state store instead of adding a field.
* fix failing tests
* fix comment
* fix comments
* store service kind destination and add relevant tests
* changes based on review
* filter on destinations when querying source match
* Apply suggestions from code review
Co-authored-by: alex <8968914+acpana@users.noreply.github.com>
* fix style
* Apply suggestions from code review
Co-authored-by: Dan Stough <dan.stough@hashicorp.com>
* rename destinationType to targetType.
Co-authored-by: Dan Stough <dan.stough@hashicorp.com>
Co-authored-by: alex <8968914+acpana@users.noreply.github.com>
Co-authored-by: github-team-consul-core <github-team-consul-core@hashicorp.com>
The upgrade compatibility tests need frequent retrying to pass because they are slightly flaky. To alleviate manual work borrow similar gotestsum logic from the main go tests CI job to rerun them automatically here.