consul

Commit Graph

Author	SHA1	Message	Date
Dhia Ayachi	d24156db14	generate a single debug file for a long duration capture (#10279 ) * debug: remove the CLI check for debug_enabled The API allows collecting profiles even debug_enabled=false as long as ACLs are enabled. Remove this check from the CLI so that users do not need to set debug_enabled=true for no reason. Also: - fix the API client to return errors on non-200 status codes for debug endpoints - improve the failure messages when pprof data can not be collected Co-Authored-By: Dhia Ayachi <dhia@hashicorp.com> * remove parallel test runs parallel runs create a race condition that fail the debug tests * snapshot the timestamp at the beginning of the capture - timestamp used to create the capture sub folder is snapshot only at the beginning of the capture and reused for subsequent captures - capture append to the file if it already exist * Revert "snapshot the timestamp at the beginning of the capture" This reverts commit c2d03346 * Refactor captureDynamic to extract capture logic for each item in a different func * snapshot the timestamp at the beginning of the capture - timestamp used to create the capture sub folder is snapshot only at the beginning of the capture and reused for subsequent captures - capture append to the file if it already exist * Revert "snapshot the timestamp at the beginning of the capture" This reverts commit c2d03346 * Refactor captureDynamic to extract capture logic for each item in a different func * extract wait group outside the go routine to avoid a race condition * capture pprof in a separate go routine * perform a single capture for pprof data for the whole duration * add missing vendor dependency * add a change log and fix documentation to reflect the change * create function for timestamp dir creation and simplify error handling * use error groups and ticker to simplify interval capture loop * Logs, profile and traces are captured for the full duration. Metrics, Heap and Go routines are captured every interval * refactor Logs capture routine and add log capture specific test * improve error reporting when log test fail * change test duration to 1s * make time parsing in log line more robust * refactor log time format in a const * test on log line empty the earliest possible and return Co-authored-by: Freddy <freddygv@users.noreply.github.com> * rename function to captureShortLived * more specific changelog Co-authored-by: Paul Banks <banks@banksco.de> * update documentation to reflect current implementation * add test for behavior when invalid param is passed to the command * fix argument line in test * a more detailed description of the new behaviour Co-authored-by: Paul Banks <banks@banksco.de> * print success right after the capture is done * remove an unnecessary error check Co-authored-by: Daniel Nephin <dnephin@hashicorp.com> * upgraded github.com/google/pprof v0.0.0-20181206194817-3ea8567a2e57 => v0.0.0-20210601050228-01bbb1931b22 Co-authored-by: Daniel Nephin <dnephin@hashicorp.com> Co-authored-by: Freddy <freddygv@users.noreply.github.com> Co-authored-by: Paul Banks <banks@banksco.de>	2021-06-07 17:12:54 +00:00
Matt Keeler	ada4d21285	Bump raft-autopilot to the latest version (#10310 )	2021-05-27 13:23:18 -04:00
R.B. Boyer	0e7ab74f17	[1.9.x] mod: bump to github.com/hashicorp/mdns v1.0.4 (#10019 ) backport of #10018 to 1.9.x	2021-04-14 14:42:10 -05:00
Matt Keeler	ab1e689c4a	Upgrade raft-autopilot and wait for autopilot it to stop when revoking leadership (#9644 ) Fixes: 9626	2021-01-27 16:15:37 +00:00
R.B. Boyer	41211a82d8	chore: run 'make update-vendor' (#9614 )	2021-01-25 17:41:31 +00:00
John Cowen	cdb1730a21	Fix -ui-content-path without regex (#9569 ) * Add templating to inject JSON into an application/json script tag Plus an external script in order to pick it out and inject the values we need injecting into ember's environment meta tag. The UI still uses env style naming (CONSUL_*) but we uses the new style JSON/golang props behind the scenes. Co-authored-by: Paul Banks <banks@banksco.de>	2021-01-20 18:48:32 +00:00
Kit Patella	d28d86a56f	Merge pull request #9510 from pierresouchay/prometheus_metrics_help_duplicate_fix [bugfix] Prometheus metrics without warnings	2021-01-06 18:53:33 +00:00
Matt Keeler	8539565046	Merge pull request #9103 from hashicorp/feature/autopilot-mod Switch to using the external autopilot module	2020-11-09 16:30:48 +00:00
Mike Morris	4f1d2a1c56	chore: upgrade to gopsutil/v3 (#9118 ) * deps: update golang.org/x/sys * deps: update imports to gopsutil/v3 * chore: make update-vendor	2020-11-07 01:49:01 +00:00
Kit Patella	5d9240d6ff	Merge pull request #9088 from hashicorp/mkcp/telemetry/add-key-metrics-definitions Add prometheus definitions for key metrics.	2020-11-06 18:45:49 +00:00
Kyle Havlovitz	4abe96aa74	vendor: Update github.com/hashicorp/yamux	2020-10-09 05:05:46 -07:00
Kyle Havlovitz	dd6ed08924	vendor: Update github.com/hashicorp/mdns	2020-10-09 04:43:27 -07:00
Kyle Havlovitz	1cc012b202	vendor: Update github.com/hashicorp/hil	2020-10-09 04:43:27 -07:00
Kyle Havlovitz	b95ab0d33c	vendor: Update github.com/hashicorp/go-version	2020-10-09 04:43:27 -07:00
Kyle Havlovitz	f389f1184d	vendor: Update github.com/hashicorp/go-memdb	2020-10-09 04:43:27 -07:00
Kyle Havlovitz	40481e2b8f	vendor: Update github.com/hashicorp/go-checkpoint	2020-10-09 04:43:27 -07:00
Mike Morris	708957a982	chore: update raft to v1.2.0 (#8822 )	2020-10-08 15:07:10 -04:00
Matt Keeler	38f5ddce2a	Add per-agent reconnect timeouts (#8781 ) This allows for client agent to be run in a more stateless manner where they may be abruptly terminated and not expected to come back. If advertising a per-agent reconnect timeout using the advertise_reconnect_timeout configuration when that agent leaves, other agents will wait only that amount of time for the agent to come back before reaping it. This has the advantageous side effect of causing servers to deregister the node/services/checks for that agent sooner than if the global reconnect_timeout was used.	2020-10-08 15:02:19 -04:00
Mike Morris	1ebc2fb006	chore(deps): update gopsutil to v2.20.9 (#8843 ) * core(deps): bump golang.org/x/sys To resolve /go/pkg/mod/github.com/shirou/gopsutil@v2.20.9+incompatible/host/host_bsd.go:20:13: undefined: unix.SysctlTimeval * chore(deps): make update-vendor	2020-10-07 12:57:18 -04:00
Daniel Nephin	627449a870	Vendor gofuzz and google/go-cmp	2020-09-28 18:28:37 -04:00
Kyle Havlovitz	b1b21139ca	Merge branch 'master' into vault-ca-renew-token	2020-09-15 14:39:04 -07:00
Kyle Havlovitz	1cd7c43544	Update vault CA for latest api client	2020-09-15 13:33:55 -07:00
Kyle Havlovitz	74dc50a771	vendor: Update vault api package	2020-09-15 12:45:29 -07:00
Daniel Nephin	0c87cf468c	Update go-metrics dependencies, to use metrics.Default()	2020-09-14 19:05:22 -04:00
R.B. Boyer	74d5df7c7a	xds: use envoy's rbac filter to handle intentions entirely within envoy (#8569 )	2020-08-27 12:20:58 -05:00
Hans Hasselberg	a932aafc91	add primary keys to list keyring (#8522 ) During gossip encryption key rotation it would be nice to be able to see if all nodes are using the same key. This PR adds another field to the json response from `GET v1/operator/keyring` which lists the primary keys in use per dc. That way an operator can tell when a key was successfully setup as primary key. Based on https://github.com/hashicorp/serf/pull/611 to add primary key to list keyring output: ```json [ { "WAN": true, "Datacenter": "dc2", "Segment": "", "Keys": { "0OuM4oC3Os18OblWiBbZUaHA7Hk+tNs/6nhNYtaNduM=": 6, "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 6 }, "PrimaryKeys": { "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 6 }, "NumNodes": 6 }, { "WAN": false, "Datacenter": "dc2", "Segment": "", "Keys": { "0OuM4oC3Os18OblWiBbZUaHA7Hk+tNs/6nhNYtaNduM=": 8, "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 8 }, "PrimaryKeys": { "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 8 }, "NumNodes": 8 }, { "WAN": false, "Datacenter": "dc1", "Segment": "", "Keys": { "0OuM4oC3Os18OblWiBbZUaHA7Hk+tNs/6nhNYtaNduM=": 3, "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 8 }, "PrimaryKeys": { "SINm887hKTzmMWeBNKTJReaTLX3mBEJKriDyt88Ad+g=": 8 }, "NumNodes": 8 } ] ``` I intentionally did not change the CLI output because I didn't find a good way of displaying this information. There are a couple of options that we could implement later: * add a flag to show the primary keys * add a flag to show json output Fixes #3393.	2020-08-18 09:50:24 +02:00
s-christoff	102b7e55da	Update Go-Metrics 0.3.4 (#8478 )	2020-08-11 11:17:43 -05:00
Kyle Havlovitz	f4efd53d57	vendor: Update github.com/armon/go-metrics to v0.3.3	2020-07-23 11:37:33 -07:00
Matt Keeler	a6a1a0e3d6	Update mapstructure to v1.3.3 (#8361 ) This was done in preparation for another PR where I was running into https://github.com/mitchellh/mapstructure/issues/202 and implemented a fix for the library.	2020-07-22 15:13:21 -04:00
R.B. Boyer	e853368c23	gossip: Avoid issue where two unique leave events for the same node could lead to infinite rebroadcast storms (#8343 ) bump serf to v0.9.3 to include fix for https://github.com/hashicorp/serf/pull/606	2020-07-21 15:48:10 -05:00
Pierre Souchay	20d1ea7d2d	Upgrade go-connlimit to v0.3.0 / return http 429 on too many connections (#8221 ) Fixes #7527 I want to highlight this and explain what I think the implications are and make sure we are aware: * `HTTPConnStateFunc` closes the connection when it is beyond the limit. `Close` does not block. * `HTTPConnStateFuncWithDefault429Handler(10 * time.Millisecond)` blocks until the following is done (worst case): 1) `conn.SetDeadline(10*time.Millisecond)` so that 2) `conn.Write(429error)` is guaranteed to timeout after 10ms, so that the http 429 can be written and 3) `conn.Close` can happen The implication of this change is that accepting any new connection is worst case delayed by 10ms. But only after a client reached the limit already.	2020-07-03 09:25:07 +02:00
Hans Hasselberg	95c027a3ea	Update gopsutil (#8208 ) https://github.com/shirou/gopsutil/pull/895 is merged and fixes our problem. Time to update. Since there is no new version just yet, updating to the sha.	2020-07-01 14:47:56 +02:00
Matt Keeler	e9835610f3	Add a test for go routine leaks This is in its own separate package so that it will be a separate test binary that runs thus isolating the go runtime from other tests and allowing accurate go routine leak checking. This test would ideally use goleak.VerifyTestMain but that will fail 100% of the time due to some architectural things (blocking queries and net/rpc uncancellability). This test is not comprehensive. We should enable/exercise more features and more cluster configurations. However its a start.	2020-06-24 17:09:50 -04:00
R.B. Boyer	c63c994b04	connect: upgrade github.com/envoyproxy/go-control-plane to v0.9.5 (#8165 )	2020-06-23 15:19:56 -05:00
Daniel Nephin	9f5a9b2150	Update go-memdb and go-lru dependencies	2020-06-16 13:00:28 -04:00
Daniel Nephin	98d271bee9	Update google.golang.org/api and stretchr/testify To match the versions used in enterprise, should slightly reduce the chances of getting a merge conflict when using `go.mod`.	2020-06-09 16:03:05 -04:00
Daniel Nephin	db74f09b6b	Update protobuf and golang.org/x/... vendor Partially extracted from #7547 Updates protobuf to the most recent in the 1.3.x series, and updates golang.org/x/sys to a7d97aace0b0 because of https://github.com/shirou/gopsutil/issues/853 prevents updating to a more recent version. This breaking change in x/sys also prevents us from getting a newer version of x/net. In the future, if gopsutil is not patched, we may want to run a fork version of gopsutil so that we can update both x/net and x/sys.	2020-06-09 14:46:41 -04:00
Daniel Nephin	99eb583ebc	Replace goe/verify.Values with testify/require.Equal (#7993 ) * testing: replace most goe/verify.Values with require.Equal One difference between these two comparisons is that go/verify considers nil slices/maps to be equal to empty slices/maps, where as testify/require does not, and does not appear to provide any way to enable that behaviour. Because of this difference some expected values were changed from empty slices to nil slices, and some calls to verify.Values were left. * Remove github.com/pascaldekloe/goe/verify Reduce the number of assertion packages we use from 2 to 1	2020-06-02 12:41:25 -04:00
R.B. Boyer	1efafd7523	acl: add auth method for JWTs (#7846 )	2020-05-11 20:59:29 -05:00
Mike Morris	291e6af33a	vendor: revert golang.org/x/sys bump to avoid FreeBSD regression (#7780 )	2020-05-05 09:26:17 +02:00
Hans Hasselberg	b5eab19183	vendor: fix case issue (#7777 )	2020-05-04 21:39:01 +02:00
Hans Hasselberg	c4093c87cc	agent: don't let left nodes hold onto their node-id (#7747 )	2020-05-04 18:39:08 +02:00
Matt Keeler	daec810e34	Merge pull request #7714 from hashicorp/oss-sync/msp-agent-token	2020-05-04 11:33:50 -04:00
Matt Keeler	55050beedb	Update go-discover dependency (#7731 )	2020-05-04 10:59:48 -04:00
Matt Keeler	8c545b5206	Update mapstructure to v1.2.3 This release contains a fix to prevent duplicate keys in the Metadata after decoding where the output value contains pointer fields.	2020-04-28 09:33:16 -04:00
R.B. Boyer	b989967791	cli: ensure that 'snapshot save' is fsync safe and also only writes to the requested file on success (#7698 )	2020-04-24 17:34:47 -05:00
R.B. Boyer	5f1518c37c	cli: fix usage of gzip.Reader to better detect corrupt snapshots during save/restore (#7697 )	2020-04-24 17:18:56 -05:00
Daniel Nephin	50d73c2674	Update github.com/joyent/triton-go to latest There was an RSA private key used for testing included in the old version. This commit updates it to a version that does not include the key so that the key is not detected by tools which scan the Consul binary for private keys. Commands run: go get github.com/joyent/triton-go@6801d15b779f042cfd821c8a41ef80fc33af9d47 make update-vendor	2020-04-16 12:34:29 -04:00
Daniel Nephin	5023a3b178	cli: send requested help text to stdout This behaviour matches the GNU CLI standard: http://www.gnu.org/prep/standards/html_node/_002d_002dhelp.html	2020-03-26 15:27:34 -04:00
R.B. Boyer	6adad71125	wan federation via mesh gateways (#6884 ) This is like a Möbius strip of code due to the fact that low-level components (serf/memberlist) are connected to high-level components (the catalog and mesh-gateways) in a twisty maze of references which make it hard to dive into. With that in mind here's a high level summary of what you'll find in the patch: There are several distinct chunks of code that are affected: * new flags and config options for the server * retry join WAN is slightly different * retry join code is shared to discover primary mesh gateways from secondary datacenters * because retry join logic runs in the agent and the results of that operation for primary mesh gateways are needed in the server there are some methods like `RefreshPrimaryGatewayFallbackAddresses` that must occur at multiple layers of abstraction just to pass the data down to the right layer. * new cache type `FederationStateListMeshGatewaysName` for use in `proxycfg/xds` layers * the function signature for RPC dialing picked up a new required field (the node name of the destination) * several new RPCs for manipulating a FederationState object: `FederationState:{Apply,Get,List,ListMeshGateways}` * 3 read-only internal APIs for debugging use to invoke those RPCs from curl * raft and fsm changes to persist these FederationStates * replication for FederationStates as they are canonically stored in the Primary and replicated to the Secondaries. * a special derivative of anti-entropy that runs in secondaries to snapshot their local mesh gateway `CheckServiceNodes` and sync them into their upstream FederationState in the primary (this works in conjunction with the replication to distribute addresses for all mesh gateways in all DCs to all other DCs) * a "gateway locator" convenience object to make use of this data to choose the addresses of gateways to use for any given RPC or gossip operation to a remote DC. This gets data from the "retry join" logic in the agent and also directly calls into the FSM. * RPC (`:8300`) on the server sniffs the first byte of a new connection to determine if it's actually doing native TLS. If so it checks the ALPN header for protocol determination (just like how the existing system uses the type-byte marker). * 2 new kinds of protocols are exclusively decoded via this native TLS mechanism: one for ferrying "packet" operations (udp-like) from the gossip layer and one for "stream" operations (tcp-like). The packet operations re-use sockets (using length-prefixing) to cut down on TLS re-negotiation overhead. * the server instances specially wrap the `memberlist.NetTransport` when running with gateway federation enabled (in a `wanfed.Transport`). The general gist is that if it tries to dial a node in the SAME datacenter (deduced by looking at the suffix of the node name) there is no change. If dialing a DIFFERENT datacenter it is wrapped up in a TLS+ALPN blob and sent through some mesh gateways to eventually end up in a server's :8300 port. * a new flag when launching a mesh gateway via `consul connect envoy` to indicate that the servers are to be exposed. This sets a special service meta when registering the gateway into the catalog. * `proxycfg/xds` notice this metadata blob to activate additional watches for the FederationState objects as well as the location of all of the consul servers in that datacenter. * `xds:` if the extra metadata is in place additional clusters are defined in a DC to bulk sink all traffic to another DC's gateways. For the current datacenter we listen on a wildcard name (`server.<dc>.consul`) that load balances all servers as well as one mini-cluster per node (`<node>.server.<dc>.consul`) * the `consul tls cert create` command got a new flag (`-node`) to help create an additional SAN in certs that can be used with this flavor of federation.	2020-03-09 15:59:02 -05:00

1 2 3 4 5 ...

371 Commits