consul

Commit Graph

Author	SHA1	Message	Date
Daniel Nephin	1a039393f5	config: add HookTranslteKeys This hook replaces lib.TranslateKeys and has a number of advantages: 1. Primarily, aliases for fields are defined on the field itself, making the aliases much easier to maintain, and more obvious to the reader. 2. TranslateKeys translation rules are not aware of structure. It could very easily incorrectly translate a key on one struct that was intended to be a translation rule for a completely different struct, leading to very hard to debug errors. The hook removes the need for the unexpected "translation rule is an empty string to indicate stop traversal" special case. 3. TranslateKeys attempts to duplicate a bunch of tree traversal logic that already exists in mapstructure. Using mapstructure for traversal removes the need to traverse the entire structure multiple times, and makes the behaviour more obvious to the reader. This change is being made to enable a future change of replacing PatchSliceOfMaps. TranslateKeys sits in between PatchSliceOfMaps and mapstructure.Decode, so it must be converted to a hook first, before PatchSliceOfMaps can be replaced by a decode hook.	2020-05-27 16:24:47 -04:00
R.B. Boyer	ddd0a13e27	agent: handle re-bootstrapping in a secondary datacenter when WAN federation via mesh gateways is configured (#7931 ) The main fix here is to always union the `primary-gateways` list with the list of mesh gateways in the primary returned from the replicated federation states list. This will allow any replicated (incorrect) state to be supplemented with user-configured (correct) state in the config file. Eventually the game of random selection whack-a-mole will pick a winning entry and re-replicate the latest federation states from the primary. If the user-configured state is actually the incorrect one, then the same eventual correct selection process will work in that case, too. The secondary fix is actually to finish making wanfed-via-mgws actually work as originally designed. Once a secondary datacenter has replicated federation states for the primary AND managed to stand up its own local mesh gateways then all of the RPCs from a secondary to the primary SHOULD go through two sets of mesh gateways to arrive in the consul servers in the primary (one hop for the secondary datacenter's mesh gateway, and one hop through the primary datacenter's mesh gateway). This was neglected in the initial implementation. While everything works, ideally we should treat communications that go around the mesh gateways as just provided for bootstrapping purposes. Now we heuristically use the success/failure history of the federation state replicator goroutine loop to determine if our current mesh gateway route is working as intended. If it is, we try using the local gateways, and if those don't work we fall back on trying the primary via the union of the replicated state and the go-discover configuration flags. This can be improved slightly in the future by possibly initializing the gateway choice to local on startup if we already have replicated state. This PR does not address that improvement. Fixes #7339	2020-05-27 11:31:10 -05:00
Kyle Havlovitz	89e6b16815	Filter wildcard gateway services to match listener protocol This now requires some type of protocol setting in ingress gateway tests to ensure the services are not filtered out. - small refactor to add a max(x, y) function - Use internal configEntryTxn function and add MaxUint64 to lib	2020-05-06 15:06:13 -05:00
R.B. Boyer	5f1518c37c	cli: fix usage of gzip.Reader to better detect corrupt snapshots during save/restore (#7697 )	2020-04-24 17:18:56 -05:00
R.B. Boyer	6adad71125	wan federation via mesh gateways (#6884 ) This is like a Möbius strip of code due to the fact that low-level components (serf/memberlist) are connected to high-level components (the catalog and mesh-gateways) in a twisty maze of references which make it hard to dive into. With that in mind here's a high level summary of what you'll find in the patch: There are several distinct chunks of code that are affected: * new flags and config options for the server * retry join WAN is slightly different * retry join code is shared to discover primary mesh gateways from secondary datacenters * because retry join logic runs in the agent and the results of that operation for primary mesh gateways are needed in the server there are some methods like `RefreshPrimaryGatewayFallbackAddresses` that must occur at multiple layers of abstraction just to pass the data down to the right layer. * new cache type `FederationStateListMeshGatewaysName` for use in `proxycfg/xds` layers * the function signature for RPC dialing picked up a new required field (the node name of the destination) * several new RPCs for manipulating a FederationState object: `FederationState:{Apply,Get,List,ListMeshGateways}` * 3 read-only internal APIs for debugging use to invoke those RPCs from curl * raft and fsm changes to persist these FederationStates * replication for FederationStates as they are canonically stored in the Primary and replicated to the Secondaries. * a special derivative of anti-entropy that runs in secondaries to snapshot their local mesh gateway `CheckServiceNodes` and sync them into their upstream FederationState in the primary (this works in conjunction with the replication to distribute addresses for all mesh gateways in all DCs to all other DCs) * a "gateway locator" convenience object to make use of this data to choose the addresses of gateways to use for any given RPC or gossip operation to a remote DC. This gets data from the "retry join" logic in the agent and also directly calls into the FSM. * RPC (`:8300`) on the server sniffs the first byte of a new connection to determine if it's actually doing native TLS. If so it checks the ALPN header for protocol determination (just like how the existing system uses the type-byte marker). * 2 new kinds of protocols are exclusively decoded via this native TLS mechanism: one for ferrying "packet" operations (udp-like) from the gossip layer and one for "stream" operations (tcp-like). The packet operations re-use sockets (using length-prefixing) to cut down on TLS re-negotiation overhead. * the server instances specially wrap the `memberlist.NetTransport` when running with gateway federation enabled (in a `wanfed.Transport`). The general gist is that if it tries to dial a node in the SAME datacenter (deduced by looking at the suffix of the node name) there is no change. If dialing a DIFFERENT datacenter it is wrapped up in a TLS+ALPN blob and sent through some mesh gateways to eventually end up in a server's :8300 port. * a new flag when launching a mesh gateway via `consul connect envoy` to indicate that the servers are to be exposed. This sets a special service meta when registering the gateway into the catalog. * `proxycfg/xds` notice this metadata blob to activate additional watches for the FederationState objects as well as the location of all of the consul servers in that datacenter. * `xds:` if the extra metadata is in place additional clusters are defined in a DC to bulk sink all traffic to another DC's gateways. For the current datacenter we listen on a wildcard name (`server.<dc>.consul`) that load balances all servers as well as one mini-cluster per node (`<node>.server.<dc>.consul`) * the `consul tls cert create` command got a new flag (`-node`) to help create an additional SAN in certs that can be used with this flavor of federation.	2020-03-09 15:59:02 -05:00
Matt Keeler	d5f9268222	ACL enforcement for the agent/health/services endpoints (#7191 ) ACL enforcement for the agent/health/services endpoints	2020-01-31 11:16:24 -05:00
R.B. Boyer	cf29bd4dcf	cli: improve the file safety of 'consul tls' subcommands (#7186 ) - also fixing the signature of file.WriteAtomicWithPerms	2020-01-31 10:12:36 -06:00
Matt Keeler	3a46e1d15f	Make PatchSliceOfMaps case insensitive This fixes some case-sensitivity issues with using camel case in configuration files.	2020-01-31 09:56:02 -05:00
Hans Hasselberg	2ad0831b34	agent: fewer file local differences between enterprise and oss (#6820 ) (#6898 ) * Increase number to test ignore. Consul Enterprise has more flags and since we are trying to reduce the differences between both code bases, we are increasing the number in oss. The semantics don't change, it is just a cosmetic thing. * Introduce agent.initEnterprise for enterprise related hooks. * Sync test with ent version. * Fix import order. * revert error wording.	2019-12-06 21:35:58 +01:00
Matt Keeler	a704ebe639	Add Namespace support to the API module and the CLI commands (#6874 ) Also update the Docs and fixup the HTTP API to return proper errors when someone attempts to use Namespaces with an OSS agent. Add Namespace HTTP API docs Make all API endpoints disallow unknown fields	2019-12-06 11:14:56 -05:00
Mike Morris	65be58703c	connect: remove managed proxies (#6220 ) * connect: remove managed proxies implementation and all supporting config options and structs * connect: remove deprecated ProxyDestination * command: remove CONNECT_PROXY_TOKEN env var * agent: remove entire proxyprocess proxy manager * test: remove all managed proxy tests * test: remove irrelevant managed proxy note from TestService_ServerTLSConfig * test: update ContentHash to reflect managed proxy removal * test: remove deprecated ProxyDestination test * telemetry: remove managed proxy note * http: remove /v1/agent/connect/proxy endpoint * ci: remove deprecated test exclusion * website: update managed proxies deprecation page to note removal * website: remove managed proxy configuration API docs * website: remove managed proxy note from built-in proxy config * website: add note on removing proxy subdirectory of data_dir	2019-08-09 15:19:30 -04:00
Alvin Huang	ef6b80bab2	resolve circleci config conflicts	2019-07-23 20:18:36 -04:00
Christian Muehlhaeuser	7753b97cc7	Simplified code in various places (#6176 ) All these changes should have no side-effects or change behavior: - Use bytes.Buffer's String() instead of a conversion - Use time.Since and time.Until where fitting - Drop unnecessary returns and assignment	2019-07-20 09:37:19 -04:00
R.B. Boyer	67a36e3452	handle structs.ConfigEntry decoding similarly to api.ConfigEntry decoding (#6106 ) Both 'consul config write' and server bootstrap config entries take a decoding detour through mapstructure on the way from HCL to an actual struct. They both may take in snake_case or CamelCase (for consistency) so need very similar handling. Unfortunately since they are operating on mirror universes of structs (api.* vs structs.*) the code cannot be identitical, so try to share the kind-configuration and duplicate the rest for now.	2019-07-12 12:20:30 -05:00
R.B. Boyer	38d76c624e	Allow for both snake_case and CamelCase for config entries written with 'consul config write'. (#6044 ) This also has the added benefit of fixing an issue with passing time.Duration fields through config entries.	2019-06-28 11:35:35 -05:00
Matt Keeler	95d44e0110	Allow MapWalk to handle []interface{} elements that are []uint8 (#5800 ) * Allow MapWalk to handle []interface{} elements that are []uint8 * Ensure ints are left alone.	2019-05-07 11:40:48 -04:00
Matt Keeler	0ac6b6faba	Fix up the MapWalk function so that it properly handles nested map[interface{}]interface{} (#5774 )	2019-05-02 14:43:54 -04:00
Paul Banks	8f5b16ebaf	Fix uint8 conversion issues for service config response maps.	2019-05-02 14:11:33 +01:00
Matt Keeler	d0f410cd84	Make a few config entry endpoints return 404s and allow for snake_case and lowercase key names. (#5748 )	2019-04-30 18:19:19 -04:00
Kyle Havlovitz	aba54cec55	Add HTTP endpoints for config entry management (#5718 )	2019-04-29 18:08:09 -04:00
Matt Keeler	5befe0f5d5	Implement config entry replication (#5706 )	2019-04-26 13:38:39 -04:00
Jeff Mitchell	47c390025b	Convert to Go Modules (#5517 ) * First conversion * Use serf 0.8.2 tag and associated updated deps * * Move freeport and testutil into internal/ * Make internal/ its own module * Update imports * Add replace statements so API and normal Consul code are self-referencing for ease of development * Adapt to newer goe/values * Bump to new cleanhttp * Fix ban nonprintable chars test * Update lock bad args test The error message when the duration cannot be parsed changed in Go 1.12 (ae0c435877d3aacb9af5e706c40f9dddde5d3e67). This updates that test. * Update another test as well * Bump travis * Bump circleci * Bump go-discover and godo to get rid of launchpad dep * Bump dockerfile go version * fix tar command * Bump go-cleanhttp	2019-03-26 17:04:58 -04:00
R.B. Boyer	f4a3b9d518	fix typos reported by golangci-lint:misspell (#5434 )	2019-03-06 11:13:28 -06:00
Matt Keeler	118adbb123	ACL Token Persistence and Reloading (#5328 ) This PR adds two features which will be useful for operators when ACLs are in use. 1. Tokens set in configuration files are now reloadable. 2. If `acl.enable_token_persistence` is set to `true` in the configuration, tokens set via the `v1/agent/token` endpoint are now persisted to disk and loaded when the agent starts (or during configuration reload) Note that token persistence is opt-in so our users who do not want tokens on the local disk will see no change. Some other secondary changes: * Refactored a bunch of places where the replication token is retrieved from the token store. This token isn't just for replicating ACLs and now it is named accordingly. * Allowed better paths in the `v1/agent/token/` API. Instead of paths like: `v1/agent/token/acl_replication_token` the path can now be just `v1/agent/token/replication`. The old paths remain to be valid. * Added a couple new API functions to set tokens via the new paths. Deprecated the old ones and pointed to the new names. The names are also generally better and don't imply that what you are setting is for ACLs but rather are setting ACL tokens. There is a minor semantic difference there especially for the replication token as again, its no longer used only for ACL token/policy replication. The new functions will detect 404s and fallback to using the older token paths when talking to pre-1.4.3 agents. * Docs updated to reflect the API additions and to show using the new endpoints. * Updated the ACL CLI set-agent-tokens command to use the non-deprecated APIs.	2019-02-27 14:28:31 -05:00
Paul Banks	ef9f27cbc8	connect: tame thundering herd of CSRs on CA rotation (#5228 ) * Support rate limiting and concurrency limiting CSR requests on servers; handle CA rotations gracefully with jitter and backoff-on-rate-limit in client * Add CSR rate limiting docs * Fix config naming and add tests for new CA configs	2019-01-22 17:19:36 +00:00
Matt Keeler	18b29c45c4	New ACLs (#4791 ) This PR is almost a complete rewrite of the ACL system within Consul. It brings the features more in line with other HashiCorp products. Obviously there is quite a bit left to do here but most of it is related docs, testing and finishing the last few commands in the CLI. I will update the PR description and check off the todos as I finish them over the next few days/week. Description At a high level this PR is mainly to split ACL tokens from Policies and to split the concepts of Authorization from Identities. A lot of this PR is mostly just to support CRUD operations on ACLTokens and ACLPolicies. These in and of themselves are not particularly interesting. The bigger conceptual changes are in how tokens get resolved, how backwards compatibility is handled and the separation of policy from identity which could lead the way to allowing for alternative identity providers. On the surface and with a new cluster the ACL system will look very similar to that of Nomads. Both have tokens and policies. Both have local tokens. The ACL management APIs for both are very similar. I even ripped off Nomad's ACL bootstrap resetting procedure. There are a few key differences though. Nomad requires token and policy replication where Consul only requires policy replication with token replication being opt-in. In Consul local tokens only work with token replication being enabled though. All policies in Nomad are globally applicable. In Consul all policies are stored and replicated globally but can be scoped to a subset of the datacenters. This allows for more granular access management. Unlike Nomad, Consul has legacy baggage in the form of the original ACL system. The ramifications of this are: A server running the new system must still support other clients using the legacy system. A client running the new system must be able to use the legacy RPCs when the servers in its datacenter are running the legacy system. The primary ACL DC's servers running in legacy mode needs to be a gate that keeps everything else in the entire multi-DC cluster running in legacy mode. So not only does this PR implement the new ACL system but has a legacy mode built in for when the cluster isn't ready for new ACLs. Also detecting that new ACLs can be used is automatic and requires no configuration on the part of administrators. This process is detailed more in the "Transitioning from Legacy to New ACL Mode" section below.	2018-10-19 12:04:07 -04:00
Paul Banks	c6ef6a61c9	Refactor to use embedded struct.	2018-06-25 12:25:39 -07:00
Paul Banks	32f362bad9	StartupTelemetry => InitTelemetry	2018-06-25 12:25:39 -07:00
Paul Banks	a7038454fd	WIP	2018-06-25 12:25:38 -07:00
Mitchell Hashimoto	e9b8e5d265	lib/file: add tests for WriteAtomic	2018-06-14 09:42:12 -07:00
Mitchell Hashimoto	1e7f253b53	agent/proxy: write pid file whenever the daemon process changes	2018-06-14 09:42:11 -07:00
Seth Vargo	0603cda5ee	Add a helper for generating Consul's user-agent string	2018-05-25 15:50:18 -04:00
Paul Banks	ff37194fc0	Go fmt cleanup	2018-05-11 17:05:19 +01:00
Preetha Appan	fff532cf84	Update serf to pick up clean leave fix	2018-05-04 15:51:55 -05:00
Veselkov Konstantin	7de57ba4de	remove golint warnings	2018-01-28 22:40:13 +04:00
James Phillips	d12e81860f	Moves Serf helper into lib to fix import cycle in consul-enterprise.	2017-12-07 16:57:58 -08:00
James Phillips	fe36ed6412	Bumps freeport's block size. We were seeing some rollover artifacts where something would be shut down so a port could be re-used, but it was still being referenced by some running thing. This gives more time before rolling over.	2017-11-29 18:33:14 -08:00
Alex Dadgar	358e6827cd	Update cluster.go	2017-10-30 16:51:28 -07:00
Alex Dadgar	6d0b9f4dac	Integer division rounding to zero for rate scaling This fixes an issue in which integer division was scaling down to zero.	2017-10-30 16:46:11 -07:00
Alex Dadgar	0fccef237d	Initialize freeport lazily to avoid runtime issues This PR makes freeport initialize lazily rather than using an init method.	2017-10-25 15:14:39 -07:00
Alex Dadgar	17dcbb1912	Make freeport testing friendly This PR allows the caller to decide if they would like to have the calling test fail, have the caller panic on error, or handle the errors themselves.	2017-10-23 16:28:02 -07:00
Frank Schroeder	c94751ad43	test: replace porter tool with freeport lib This patch removes the porter tool which hands out free ports from a given range with a library which does the same thing. The challenge for acquiring free ports in concurrent go test runs is that go packages are tested concurrently and run in separate processes. There has to be some inter-process synchronization in preventing processes allocating the same ports. freeport allocates blocks of ports from a range expected to be not in heavy use and implements a system-wide mutex by binding to the first port of that block for the lifetime of the application. Ports are then provided sequentially from that block and are tested on localhost before being returned as available.	2017-10-21 22:01:09 +02:00
James Phillips	bb12368eac	Makes RPC handling more robust when rolling servers. (#3561 ) * Adds client-side retry for no leader errors. This paves over the case where the client was connected to the leader when it loses leadership. * Adds a configurable server RPC drain time and a fail-fast path for RPCs. When a server leaves it gets removed from the Raft configuration, so it will never know who the new leader server ends up being. Without this we'd be doomed to wait out the RPC hold timeout and then fail. This makes things fail a little quicker while a sever is draining, and since we added a client retry AND since the server doing this has already shut down and left the Serf LAN, clients should retry against some other server. * Makes the RPC hold timeout configurable. * Reorders struct members. * Sets the RPC hold timeout default for test servers. * Bumps the leave drain time up to 5 seconds. * Robustifies retries with a simpler client-side RPC hold. * Reverts untended delete.	2017-10-10 15:19:50 -07:00
James Phillips	b1a15e0c3d	Adds open source side of network segments (feature is Enterprise-only).	2017-08-30 11:58:29 -07:00
Frank Schroeder	3403cd4372	golint: Fix existing comments This needs more work.	2017-04-25 09:26:13 -07:00
James Phillips	7c27ca1f77	Adds missing unit tests and cleans up some router bugs.	2017-03-16 16:42:19 -07:00
James Phillips	1091c7314e	Removes remoteConsuls in favor of the new router. This has the next wave of RTT integration with the router and also factors some common RTT-related helpers out to lib. While we were in here we also got rid of the coordinate disable config so we don't need to deal with the complexity in the router (there was never a user-visible way to disable coordinates).	2017-03-16 16:42:19 -07:00
James Phillips	bd605e330c	Adds basic support for node IDs.	2017-01-17 22:47:59 -08:00
Sean Chittenden	d695bcaae6	Use a cryptographically secure seed `SeededSecurely` is present if someone or something wants to query the way the library was seeded. Obtained from: nomad	2016-05-02 23:52:37 -07:00
Sean Chittenden	da298f527d	Guard against divide by zero in lib.RandomStagger() While I'm at it, add a DurationMinusBufferDomain() function to calculate the min/max for a given call to DurationMinusBuffer() in order to keep the implementation details self-contained.	2016-04-23 13:11:32 -07:00

1 2

59 Commits