consul

Commit Graph

Author	SHA1	Message	Date
R.B. Boyer	1535844c62	gossip: refactor some gossip related libraries into a central place (#21036 ) This refactors and relocates the following packages to live under internal/gossip instead of either in the toplevel lib or agent/consul: - librtt : related to serf coordinates - libserf : random serf stuff	2024-05-07 10:30:49 -05:00
hashicorp-copywrite[bot]	5fb9df1640	[COMPLIANCE] License changes (#18443 ) * Adding explicit MPL license for sub-package This directory and its subdirectories (packages) contain files licensed with the MPLv2 `LICENSE` file in this directory and are intentionally licensed separately from the BSL `LICENSE` file at the root of this repository. * Adding explicit MPL license for sub-package This directory and its subdirectories (packages) contain files licensed with the MPLv2 `LICENSE` file in this directory and are intentionally licensed separately from the BSL `LICENSE` file at the root of this repository. * Updating the license from MPL to Business Source License Going forward, this project will be licensed under the Business Source License v1.1. Please see our blog post for more details at <Blog URL>, FAQ at www.hashicorp.com/licensing-faq, and details of the license at www.hashicorp.com/bsl. * add missing license headers * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 * Update copyright file headers to BUSL-1.1 --------- Co-authored-by: hashicorp-copywrite[bot] <110428419+hashicorp-copywrite[bot]@users.noreply.github.com>	2023-08-11 09:12:13 -04:00
Ronald	94ec4eb2f4	copyright headers for agent folder (#16704 ) * copyright headers for agent folder * Ignore test data files * fix proto files and remove headers in agent/uiserver folder * ignore deep-copy files	2023-03-28 14:39:22 -04:00
Chris S. Kim	bde57c0dd0	Regenerate files according to 1.19.2 formatter	2022-10-24 16:12:08 -04:00
Daniel Nephin	b058845110	sdk: add TestLogLevel for setting log level in tests And default log level to WARN.	2022-02-03 13:42:28 -05:00
R.B. Boyer	b1605639fc	light refactors to support making partitions and serf-based wan federation are mutually exclusive (#11755 )	2021-12-06 13:18:02 -06:00
R.B. Boyer	e20e6348dd	areas: make the gRPC server tracker network area aware (#11748 ) Fixes a bug whereby servers present in multiple network areas would be properly segmented in the Router, but not in the gRPC mirror. This would lead servers in the current datacenter leaving from a network area (possibly during the network area's removal) from deleting their own records that still exist in the standard WAN area. The gRPC client stack uses the gRPC server tracker to execute all RPCs, even those targeting members of the current datacenter (which is unlike the net/rpc stack which has a bypass mechanism). This would manifest as a gRPC method call never opening a socket because it would block forever waiting for the current datacenter's pool of servers to be non-empty.	2021-12-06 09:55:54 -06:00
Mohammad Banikazemi	bcadd341eb	Correcting the changed function name in comment Signed-off-by: Mohammad Banikazemi <mbanikazemi@gmail.com>	2021-02-06 20:23:40 -05:00
Daniel Nephin	b9e60c0775	testing: skip slow tests with -short Add a skip condition to all tests slower than 100ms. This change was made using `gotestsum tool slowest` with data from the last 3 CI runs of master. See https://github.com/gotestyourself/gotestsum#finding-and-skipping-slow-tests With this change: ``` $ time go test -count=1 -short ./agent ok github.com/hashicorp/consul/agent 0.743s real 0m4.791s $ time go test -count=1 -short ./agent/consul ok github.com/hashicorp/consul/agent/consul 4.229s real 0m8.769s ```	2020-12-07 13:42:55 -05:00
Daniel Nephin	0003720f78	agent/router: refactor calculation of delay between rebalances. This change attempts to make the delay logic more obvious by: * remove indirection, inline a bunch of function calls * move all the code and constants next to each other * replace the two constant values with a single value * reword the comments.	2020-10-15 15:59:36 -04:00
Daniel Nephin	119c446cf2	agent/router: Add bounds test cases	2020-10-15 14:43:29 -04:00
Daniel Nephin	12e174900b	router: organize the test by number of servers And adddd some additional cases to show where the minimum value stops being used	2020-10-15 13:53:37 -04:00
Daniel Nephin	8697cc2b45	router: make refreshServerRebalanceTimer test a lot more strict	2020-10-15 12:05:07 -04:00
Daniel Nephin	2294793357	agent/grpc: use router.Manager to handle the rebalance The router.Manager is already rebalancing servers for other connection pools, so it can call into our resolver to do the same. This change allows us to remove the serf dependency from resolverBuilder, and remove Datacenter from the config. Also revert the change to refreshServerRebalanceTimer	2020-09-24 12:53:14 -04:00
Daniel Nephin	07b4507f1e	router: remove grpcServerTracker from managers It only needs to be refereced from the Router, because there is only 1 instance, and the Router can call AddServer/RemoveServer like it does on the Manager.	2020-09-24 12:53:14 -04:00
Daniel Nephin	f936ca5aea	grpc: client conn pool and resolver Extracted from `936522a13c` Co-authored-by: Paul Banks <banks@banksco.de>	2020-09-24 12:46:22 -04:00
Pierre Souchay	638dcd3360	[BUGFIX] Avoid GetDatacenter* methods to flood Consul servers logs When calling `GetDatacentersByDistance()` or `GetDatacentersMap()`, an incorrect condition was used to diplay log message, thus flooding Consul's logs. Example of message: ``` [WARN] agent.router: Non-server in server-only area: non_server=myClientNode area=lan ``` This message is only valid for WAN areas, filter to avoid creating hundreds of logs/s on our clusters, each time someone is calling this method. Our logs were flooded by such messages when migrating our Consul servers from 1.7.7 to 1.8.4. This will issue fix #8663	2020-09-15 11:54:59 +02:00
Matt Keeler	f97cc0445a	Move RPC router from Client/Server and into BaseDeps (#8559 ) This will allow it to be a shared component which is needed for AutoConfig	2020-08-27 11:23:52 -04:00
Hans Hasselberg	aff02198d7	Refactor keyring ops: * changes some functions to return data instead of modifying pointer arguments * renames globalRPC() to keyringRPCs() to make its purpose more clear * restructures KeyringOperation() to make it more understandable	2020-08-11 13:42:03 +02:00
Hans Hasselberg	51a8e15cf8	Mark its own cluster as healthy when rebalancing. (#8406 ) This code started as an optimization to avoid doing an RPC Ping to itself. But in a single server cluster the rebalancing was led to believe that there were no healthy servers because foundHealthyServer was not set. Now this is being set properly. Fixes #8401 and #8403.	2020-08-06 10:42:09 +02:00
Daniel Nephin	068b43df90	Enable gofmt simplify Code changes done automatically with 'gofmt -s -w'	2020-06-16 13:21:11 -04:00
Hans Hasselberg	0f343332da	Merge pull request #7966 from hashicorp/pool_improvements Agent connection pool cleanup	2020-06-04 08:56:26 +02:00
Matt Keeler	0e4c65d422	Fix segfault due to race condition for checking server versions (#7957 ) The ACL monitoring routine uses c.routers to check for server version updates. Therefore it needs to be started after initializing the routers.	2020-06-03 10:36:32 -04:00
Hans Hasselberg	c45432014b	pool: remove version The version field has been used to decide which multiplexing to use. It was introduced in `2457293dce`. But this is 6y ago and there is no need for this differentiation anymore.	2020-05-28 23:06:01 +02:00
Daniel Nephin	c88fae0aac	ci: Add staticcheck and fix most errors Three of the checks are temporarily disabled to limit the size of the diff, and allow us to enable all the other checks in CI. In a follow up we can fix the issues reported by the other checks one at a time, and enable them.	2020-05-28 11:59:58 -04:00
Daniel Nephin	c662f0f0de	Fix a number of problems found by staticcheck Some of these problems are minor (unused vars), but others are real bugs (ignored errors). Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>	2020-05-19 16:50:14 -04:00
Hans Hasselberg	096a2f2f02	network_segments: stop advertising segment tags	2020-05-05 21:32:05 +02:00
Hans Hasselberg	995a24b8e4	agent: refactor to use a single addrFn	2020-05-05 21:08:10 +02:00
Hans Hasselberg	6994c0d47f	agent: rename local/global to src/dst	2020-05-05 21:07:34 +02:00
Hans Hasselberg	51549bd232	rpc: oss changes for network area connection pooling (#7735 )	2020-04-30 22:12:17 +02:00
Pierre Souchay	5e79efc80f	Fixed comment on wrong line. While investigating and fixing an issue on our 1.5.1 branch, I saw you also/already fixed the bug I found (tags not updated for existing servers), but comment is misplaced.	2020-04-24 01:15:15 +02:00
R.B. Boyer	6adad71125	wan federation via mesh gateways (#6884 ) This is like a Möbius strip of code due to the fact that low-level components (serf/memberlist) are connected to high-level components (the catalog and mesh-gateways) in a twisty maze of references which make it hard to dive into. With that in mind here's a high level summary of what you'll find in the patch: There are several distinct chunks of code that are affected: * new flags and config options for the server * retry join WAN is slightly different * retry join code is shared to discover primary mesh gateways from secondary datacenters * because retry join logic runs in the agent and the results of that operation for primary mesh gateways are needed in the server there are some methods like `RefreshPrimaryGatewayFallbackAddresses` that must occur at multiple layers of abstraction just to pass the data down to the right layer. * new cache type `FederationStateListMeshGatewaysName` for use in `proxycfg/xds` layers * the function signature for RPC dialing picked up a new required field (the node name of the destination) * several new RPCs for manipulating a FederationState object: `FederationState:{Apply,Get,List,ListMeshGateways}` * 3 read-only internal APIs for debugging use to invoke those RPCs from curl * raft and fsm changes to persist these FederationStates * replication for FederationStates as they are canonically stored in the Primary and replicated to the Secondaries. * a special derivative of anti-entropy that runs in secondaries to snapshot their local mesh gateway `CheckServiceNodes` and sync them into their upstream FederationState in the primary (this works in conjunction with the replication to distribute addresses for all mesh gateways in all DCs to all other DCs) * a "gateway locator" convenience object to make use of this data to choose the addresses of gateways to use for any given RPC or gossip operation to a remote DC. This gets data from the "retry join" logic in the agent and also directly calls into the FSM. * RPC (`:8300`) on the server sniffs the first byte of a new connection to determine if it's actually doing native TLS. If so it checks the ALPN header for protocol determination (just like how the existing system uses the type-byte marker). * 2 new kinds of protocols are exclusively decoded via this native TLS mechanism: one for ferrying "packet" operations (udp-like) from the gossip layer and one for "stream" operations (tcp-like). The packet operations re-use sockets (using length-prefixing) to cut down on TLS re-negotiation overhead. * the server instances specially wrap the `memberlist.NetTransport` when running with gateway federation enabled (in a `wanfed.Transport`). The general gist is that if it tries to dial a node in the SAME datacenter (deduced by looking at the suffix of the node name) there is no change. If dialing a DIFFERENT datacenter it is wrapped up in a TLS+ALPN blob and sent through some mesh gateways to eventually end up in a server's :8300 port. * a new flag when launching a mesh gateway via `consul connect envoy` to indicate that the servers are to be exposed. This sets a special service meta when registering the gateway into the catalog. * `proxycfg/xds` notice this metadata blob to activate additional watches for the FederationState objects as well as the location of all of the consul servers in that datacenter. * `xds:` if the extra metadata is in place additional clusters are defined in a DC to bulk sink all traffic to another DC's gateways. For the current datacenter we listen on a wildcard name (`server.<dc>.consul`) that load balances all servers as well as one mini-cluster per node (`<node>.server.<dc>.consul`) * the `consul tls cert create` command got a new flag (`-node`) to help create an additional SAN in certs that can be used with this flavor of federation.	2020-03-09 15:59:02 -05:00
Matt Keeler	61d8778210	Sync some feature flag support from enterprise (#7167 )	2020-01-29 13:21:38 -05:00
Chris Piraino	401221de58	Allow users to configure either unstructured or JSON logging (#7130 ) * hclog Allow users to choose between unstructured and JSON logging	2020-01-28 17:50:41 -06:00
Hans Hasselberg	b6499fe6b8	Do not surface left servers (#6420 ) * do not surface left servers in catalog	2019-10-08 22:16:00 -05:00
Pierre Souchay	be50400c62	Distinguish between DC not existing and not being available (#6399 )	2019-09-03 09:46:24 -06:00
Christian Muehlhaeuser	7753b97cc7	Simplified code in various places (#6176 ) All these changes should have no side-effects or change behavior: - Use bytes.Buffer's String() instead of a conversion - Use time.Since and time.Until where fitting - Drop unnecessary returns and assignment	2019-07-20 09:37:19 -04:00
Matt Keeler	200c0fb3e9	Call RemoveServer for reap events (#5317 ) This ensures that servers are removed from RPC routing when they are reaped.	2019-03-04 09:19:35 -05:00
Josh Soref	94835a2715	Spelling (#3958 ) * spelling: another * spelling: autopilot * spelling: beginning * spelling: circonus * spelling: default * spelling: definition * spelling: distance * spelling: encountered * spelling: enterprise * spelling: expands * spelling: exits * spelling: formatting * spelling: health * spelling: hierarchy * spelling: imposed * spelling: independence * spelling: inspect * spelling: last * spelling: latest * spelling: client * spelling: message * spelling: minimum * spelling: notify * spelling: nonexistent * spelling: operator * spelling: payload * spelling: preceded * spelling: prepared * spelling: programmatically * spelling: required * spelling: reconcile * spelling: responses * spelling: request * spelling: response * spelling: results * spelling: retrieve * spelling: service * spelling: significantly * spelling: specifies * spelling: supported * spelling: synchronization * spelling: synchronous * spelling: themselves * spelling: unexpected * spelling: validations * spelling: value	2018-03-19 16:56:00 +00:00
James Phillips	d9a6e2a901	Makes server manager shift away from failed servers from Serf events. Because this code was doing pointer equality checks, it would work for the case of a failed attempted RPC because the objects are from the manager itself: https://github.com/hashicorp/consul/blob/v1.0.3/agent/consul/rpc.go#L283-L302 But the pointer check would always fail for events coming in from the Serf path because the server object is newly-created: https://github.com/hashicorp/consul/blob/v1.0.3/agent/router/serf_adapter.go#L14-L40 This means that we didn't proactively shift RPC traffic away from a failed server, we'd have to wait for an RPC to fail, which exposes the error to the calling client. By switching over to a name check vs. a pointer check we get the correct behavior. We added a DEBUG log as well to help observe this behavior during integrated testing. Related to #3863 since the fix here needed the same logic duplicated, owing to the complicated atomic stuff. /cc @dadgar for a heads up in case this also affects Nomad.	2018-02-05 17:56:00 -08:00
James Phillips	85e678fbdd	Saves the cycled server list after a failed ping when rebalancing. (#3662 ) Fixes #3463	2017-11-07 18:13:23 -08:00
Kyle Havlovitz	14b027a3c2	Add segment addr field to tags for LAN flood joiner	2017-08-30 11:58:29 -07:00
Frank Schroeder	16c58da27d	agent: drop unused code This code from http://github.com/hashicorp/consul/pull/3353 is no longer required.	2017-08-22 00:02:46 +02:00
Frank Schroeder	7cff50a4df	agent: move agent/consul/agent to agent/metadata	2017-08-09 14:36:52 +02:00
Frank Schroeder	c395599cea	agent: move agent/consul/servers to agent/router	2017-08-09 14:36:37 +02:00

45 Commits