consul

Commit Graph

Author	SHA1	Message	Date
Hans Hasselberg	51a8e15cf8	Mark its own cluster as healthy when rebalancing. (#8406 ) This code started as an optimization to avoid doing an RPC Ping to itself. But in a single server cluster the rebalancing was led to believe that there were no healthy servers because foundHealthyServer was not set. Now this is being set properly. Fixes #8401 and #8403.	2020-08-06 10:42:09 +02:00
Hans Hasselberg	0f343332da	Merge pull request #7966 from hashicorp/pool_improvements Agent connection pool cleanup	2020-06-04 08:56:26 +02:00
Matt Keeler	0e4c65d422	Fix segfault due to race condition for checking server versions (#7957 ) The ACL monitoring routine uses c.routers to check for server version updates. Therefore it needs to be started after initializing the routers.	2020-06-03 10:36:32 -04:00
Hans Hasselberg	c45432014b	pool: remove version The version field has been used to decide which multiplexing to use. It was introduced in `2457293dce`. But this is 6y ago and there is no need for this differentiation anymore.	2020-05-28 23:06:01 +02:00
Daniel Nephin	c662f0f0de	Fix a number of problems found by staticcheck Some of these problems are minor (unused vars), but others are real bugs (ignored errors). Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>	2020-05-19 16:50:14 -04:00
Hans Hasselberg	51549bd232	rpc: oss changes for network area connection pooling (#7735 )	2020-04-30 22:12:17 +02:00
R.B. Boyer	6adad71125	wan federation via mesh gateways (#6884 ) This is like a Möbius strip of code due to the fact that low-level components (serf/memberlist) are connected to high-level components (the catalog and mesh-gateways) in a twisty maze of references which make it hard to dive into. With that in mind here's a high level summary of what you'll find in the patch: There are several distinct chunks of code that are affected: * new flags and config options for the server * retry join WAN is slightly different * retry join code is shared to discover primary mesh gateways from secondary datacenters * because retry join logic runs in the agent and the results of that operation for primary mesh gateways are needed in the server there are some methods like `RefreshPrimaryGatewayFallbackAddresses` that must occur at multiple layers of abstraction just to pass the data down to the right layer. * new cache type `FederationStateListMeshGatewaysName` for use in `proxycfg/xds` layers * the function signature for RPC dialing picked up a new required field (the node name of the destination) * several new RPCs for manipulating a FederationState object: `FederationState:{Apply,Get,List,ListMeshGateways}` * 3 read-only internal APIs for debugging use to invoke those RPCs from curl * raft and fsm changes to persist these FederationStates * replication for FederationStates as they are canonically stored in the Primary and replicated to the Secondaries. * a special derivative of anti-entropy that runs in secondaries to snapshot their local mesh gateway `CheckServiceNodes` and sync them into their upstream FederationState in the primary (this works in conjunction with the replication to distribute addresses for all mesh gateways in all DCs to all other DCs) * a "gateway locator" convenience object to make use of this data to choose the addresses of gateways to use for any given RPC or gossip operation to a remote DC. This gets data from the "retry join" logic in the agent and also directly calls into the FSM. * RPC (`:8300`) on the server sniffs the first byte of a new connection to determine if it's actually doing native TLS. If so it checks the ALPN header for protocol determination (just like how the existing system uses the type-byte marker). * 2 new kinds of protocols are exclusively decoded via this native TLS mechanism: one for ferrying "packet" operations (udp-like) from the gossip layer and one for "stream" operations (tcp-like). The packet operations re-use sockets (using length-prefixing) to cut down on TLS re-negotiation overhead. * the server instances specially wrap the `memberlist.NetTransport` when running with gateway federation enabled (in a `wanfed.Transport`). The general gist is that if it tries to dial a node in the SAME datacenter (deduced by looking at the suffix of the node name) there is no change. If dialing a DIFFERENT datacenter it is wrapped up in a TLS+ALPN blob and sent through some mesh gateways to eventually end up in a server's :8300 port. * a new flag when launching a mesh gateway via `consul connect envoy` to indicate that the servers are to be exposed. This sets a special service meta when registering the gateway into the catalog. * `proxycfg/xds` notice this metadata blob to activate additional watches for the FederationState objects as well as the location of all of the consul servers in that datacenter. * `xds:` if the extra metadata is in place additional clusters are defined in a DC to bulk sink all traffic to another DC's gateways. For the current datacenter we listen on a wildcard name (`server.<dc>.consul`) that load balances all servers as well as one mini-cluster per node (`<node>.server.<dc>.consul`) * the `consul tls cert create` command got a new flag (`-node`) to help create an additional SAN in certs that can be used with this flavor of federation.	2020-03-09 15:59:02 -05:00
Matt Keeler	61d8778210	Sync some feature flag support from enterprise (#7167 )	2020-01-29 13:21:38 -05:00
Chris Piraino	401221de58	Allow users to configure either unstructured or JSON logging (#7130 ) * hclog Allow users to choose between unstructured and JSON logging	2020-01-28 17:50:41 -06:00
Christian Muehlhaeuser	7753b97cc7	Simplified code in various places (#6176 ) All these changes should have no side-effects or change behavior: - Use bytes.Buffer's String() instead of a conversion - Use time.Since and time.Until where fitting - Drop unnecessary returns and assignment	2019-07-20 09:37:19 -04:00
James Phillips	d9a6e2a901	Makes server manager shift away from failed servers from Serf events. Because this code was doing pointer equality checks, it would work for the case of a failed attempted RPC because the objects are from the manager itself: https://github.com/hashicorp/consul/blob/v1.0.3/agent/consul/rpc.go#L283-L302 But the pointer check would always fail for events coming in from the Serf path because the server object is newly-created: https://github.com/hashicorp/consul/blob/v1.0.3/agent/router/serf_adapter.go#L14-L40 This means that we didn't proactively shift RPC traffic away from a failed server, we'd have to wait for an RPC to fail, which exposes the error to the calling client. By switching over to a name check vs. a pointer check we get the correct behavior. We added a DEBUG log as well to help observe this behavior during integrated testing. Related to #3863 since the fix here needed the same logic duplicated, owing to the complicated atomic stuff. /cc @dadgar for a heads up in case this also affects Nomad.	2018-02-05 17:56:00 -08:00
James Phillips	85e678fbdd	Saves the cycled server list after a failed ping when rebalancing. (#3662 ) Fixes #3463	2017-11-07 18:13:23 -08:00
Frank Schroeder	16c58da27d	agent: drop unused code This code from http://github.com/hashicorp/consul/pull/3353 is no longer required.	2017-08-22 00:02:46 +02:00
Frank Schroeder	7cff50a4df	agent: move agent/consul/agent to agent/metadata	2017-08-09 14:36:52 +02:00
Frank Schroeder	c395599cea	agent: move agent/consul/servers to agent/router	2017-08-09 14:36:37 +02:00

15 Commits