Commit Graph

11377 Commits

Author SHA1 Message Date
R.B. Boyer 6adad71125
wan federation via mesh gateways (#6884)
This is like a Möbius strip of code due to the fact that low-level components (serf/memberlist) are connected to high-level components (the catalog and mesh-gateways) in a twisty maze of references which make it hard to dive into. With that in mind here's a high level summary of what you'll find in the patch:

There are several distinct chunks of code that are affected:

* new flags and config options for the server

* retry join WAN is slightly different

* retry join code is shared to discover primary mesh gateways from secondary datacenters

* because retry join logic runs in the *agent* and the results of that
  operation for primary mesh gateways are needed in the *server* there are
  some methods like `RefreshPrimaryGatewayFallbackAddresses` that must occur
  at multiple layers of abstraction just to pass the data down to the right
  layer.

* new cache type `FederationStateListMeshGatewaysName` for use in `proxycfg/xds` layers

* the function signature for RPC dialing picked up a new required field (the
  node name of the destination)

* several new RPCs for manipulating a FederationState object:
  `FederationState:{Apply,Get,List,ListMeshGateways}`

* 3 read-only internal APIs for debugging use to invoke those RPCs from curl

* raft and fsm changes to persist these FederationStates

* replication for FederationStates as they are canonically stored in the
  Primary and replicated to the Secondaries.

* a special derivative of anti-entropy that runs in secondaries to snapshot
  their local mesh gateway `CheckServiceNodes` and sync them into their upstream
  FederationState in the primary (this works in conjunction with the
  replication to distribute addresses for all mesh gateways in all DCs to all
  other DCs)

* a "gateway locator" convenience object to make use of this data to choose
  the addresses of gateways to use for any given RPC or gossip operation to a
  remote DC. This gets data from the "retry join" logic in the agent and also
  directly calls into the FSM.

* RPC (`:8300`) on the server sniffs the first byte of a new connection to
  determine if it's actually doing native TLS. If so it checks the ALPN header
  for protocol determination (just like how the existing system uses the
  type-byte marker).

* 2 new kinds of protocols are exclusively decoded via this native TLS
  mechanism: one for ferrying "packet" operations (udp-like) from the gossip
  layer and one for "stream" operations (tcp-like). The packet operations
  re-use sockets (using length-prefixing) to cut down on TLS re-negotiation
  overhead.

* the server instances specially wrap the `memberlist.NetTransport` when running
  with gateway federation enabled (in a `wanfed.Transport`). The general gist is
  that if it tries to dial a node in the SAME datacenter (deduced by looking
  at the suffix of the node name) there is no change. If dialing a DIFFERENT
  datacenter it is wrapped up in a TLS+ALPN blob and sent through some mesh
  gateways to eventually end up in a server's :8300 port.

* a new flag when launching a mesh gateway via `consul connect envoy` to
  indicate that the servers are to be exposed. This sets a special service
  meta when registering the gateway into the catalog.

* `proxycfg/xds` notice this metadata blob to activate additional watches for
  the FederationState objects as well as the location of all of the consul
  servers in that datacenter.

* `xds:` if the extra metadata is in place additional clusters are defined in a
  DC to bulk sink all traffic to another DC's gateways. For the current
  datacenter we listen on a wildcard name (`server.<dc>.consul`) that load
  balances all servers as well as one mini-cluster per node
  (`<node>.server.<dc>.consul`)

* the `consul tls cert create` command got a new flag (`-node`) to help create
  an additional SAN in certs that can be used with this flavor of federation.
2020-03-09 15:59:02 -05:00
Freddy 602aa742d8
Update namespace docs for config entries (#7420) 2020-03-09 14:51:21 -06:00
Dane Harrigan 382d33bb7e
Update envoy.html.md.erb (#7394)
Minor typo
2020-03-09 13:58:29 -04:00
Matt Keeler e3891db55b
Gather instance counts of aggregated services (#7415) 2020-03-09 11:56:19 -04:00
Noel Quiles ba9849bdf8
website:update middleman-hashicorp to 0.3.44 (#7382) 2020-03-09 14:41:58 +01:00
Pierre Souchay 864f7efffa
agent: configuration reload preserves check's statuses for services (#7345)
This fixes issue #7318

Between versions 1.5.2 and 1.5.3, a regression has been introduced regarding health
of services. A patch #6144 had been issued for HealthChecks of nodes, but not for healthchecks
of services.

What happened when a reload was:

1. save all healthcheck statuses
2. cleanup everything
3. add new services with healthchecks

In step 3, the state of healthchecks was taken into account locally,
so at step 3, but since we cleaned up at step 2, state was lost.

This PR introduces the snap parameter, so step 3 can use information from step 1
2020-03-09 12:59:41 +01:00
Alex Dzyoba 4137d06f9f
command: change delim in columnize to funny node names (#6652)
When node name contains vertical bar symbol some commands output is
garbled because `|` is used as a delimiter in `columnize.SimpleFormat`.

This commit changes format string to use `\x1f` - ASCII unit
separator[1] as a delimiter and also adds test to cover this case.

Affected commands:

* `consul catalog nodes`
* `consul members`
* `consul operator raft list-peers`
* `consul intention get`

Fixes #3951.

[1]: https://en.wikipedia.org/wiki/Delimiter#Solutions
2020-03-09 11:24:56 +01:00
Hans Hasselberg c46e2ae59b
docs: add docs for kv_max_value_size (#7405)
Apart from the added docs, the error messages are similar now and are
pointing to the corresponding options.
Fixes #6708.
2020-03-09 11:13:40 +01:00
Johannes Scheuermann f8ded993af
agent: log error when agent crashes in an early stage (#7411) 2020-03-09 10:45:21 +01:00
John Cowen 5f625666b4
ui: Enable recovery from an unreachable datacenter (500 error) (#7404)
For URL maintenance reasons we store the last visited DC in
localStorage incase you come back to a page (for example settings) that
doesn't have a dc in the URL.

A problem arises here if the last DC you tried to visit is unreachable.

The first fix here clears out the last visited DC from localStorage if
the API has errored out.

Secondly, our `href-mut` helper which mutates the current current and
replaces 'parts' in the URL rather than the whole thing functioned by
detecting the current route/URL you are on an 'mutating' that. A problem
arose here as even though you might be on the `/ui/dc-1/services` URL the
actual route is the 'error' route which does not have a URL that can be
changed properly.

The second fix here uses route.currentRoute.name over route.currentRouteName.

The latter is equal to error when an error occurs whereas the former gives you the name of the route before the error happened, which is actually what we want/the intent here.

ie. when `router.currentRouteName === 'error'` then
`router.currentRoute.name === Name Of Route Before It Errored` it seems
2020-03-09 09:10:47 +00:00
Alvin Huang 45bbb6e035
add auto cherry-picking (#7406)
* add auto cherry-picking

* exit on git cherry-pick failure

* release branches are #.#.x
2020-03-06 17:59:14 -05:00
Kim Ngo c5fe112e59
Update CHANGELOG.md 2020-03-05 15:49:26 -06:00
Kim Ngo a8f4123d37
agent/txn_endpoint: configure max txn request length (#7388)
configure max transaction size separately from kv limit
2020-03-05 15:42:37 -06:00
Matt Keeler 0041102e29
Change where the envoy snapshots get put when a test fails (#7298)
This will allow us to capture them in CI
2020-03-05 16:01:10 -05:00
Matt Keeler ebb0ecf5a8
Update CHANGELOG.md 2020-03-05 15:40:01 -05:00
Matt Keeler 7584dfe8c8 Fix session backwards incompatibility with 1.6.x and earlier. 2020-03-05 15:34:55 -05:00
Freddy ee24f4dcc1
1.7 upgrade note (#7397)
The Session API in Consul 1.7.0 and 1.7.1 is incompatible with prior versions of Consul.

This PR adds a note to our version-specific upgrade guide to guard against users upgrading before the fix in 1.7.2 is released.
2020-03-05 13:04:04 -07:00
John Cowen 4befec8f0c
docs: Add that `response_headers` also affects the UI (#7376) 2020-03-05 12:06:35 +00:00
Alvin Huang a24e431c0e
update envoy doc notes (#7389) 2020-03-04 14:59:30 -05:00
John Cowen 6074a1b4c8
ui: Coordinates don't require a nspace, so don't expect one in the repo (#7378)
* ui: Coordinates don't require a nspace, so don't expect one in the repo

* Add a test to prove the correct number of arguments are being passed
2020-03-04 18:12:47 +00:00
John Cowen 468e82dcf9
ui: Alter position of logic for showing the Round Trip Time tab to prevent DOM refresh (#7377)
* ui: Move tomography length check inside of the partial

Previously we checked the length of tomography.distances to decide
whether to show the RTT tab or not. Previous to our ember upgrade this
would not cause a DOM reload of so many elements (i.e. all of the tab
content). Since our ember upgrade, any change to tomography (so not
necessarily the length of distances) seems to fire a change to the length (even if
the length remains the same). The knock on effect of this is that the
array of tab panels seems to be recalculated (but remain the same) and
all of the tab panels are completely re-rendered, causing the scroll of
the page to be reset.

This commit moves the check for tomography.distance.length to the lower
down with the loop, which means the array of tab panels always remains
the same, which consequently means that the entire array of tab panels
is never re-rendered entirely, and therefore fixes the issue.
2020-03-04 18:12:27 +00:00
John Cowen e83fb1882c
Adds http_config.response_headers to the UI headers plus tests (#7369) 2020-03-03 13:18:35 +00:00
Pierre Souchay 2300e2d4ba
agent: take Prometheus MIME-type header into account (#7371)
This will avoid adding format=prometheus in request and to parse
more easily metrics using Prometheus.
This commit follows https://github.com/hashicorp/consul/pull/6514 as
the PR has been closed and extends it by accepting old Prometheus
mime-type.
2020-03-03 14:18:19 +01:00
Kyle Havlovitz 7c57837908 Add stub methods for ACL/segment bug fix from enterprise 2020-03-02 10:30:23 -08:00
steven jacobs ca6e866232
docs:add documentation for Linode cloud auto-join (#6719)
The go-discover library supports Linode. This adds support for
discovering other Consul agents running on Linode. Consul has supported
this since [66b8c20][1] was merged, so this commit just updates the
documentation to match current features.

[1]: 66b8c20990
2020-02-27 06:51:21 -05:00
Preetha de366cc5a4
Merge pull request #7340 from hashicorp/docs/change-website-download-v1.7.1
Update Consul version on website to 1.7.1
2020-02-23 21:56:29 -05:00
Blake Covarrubias ab20785210 Update Consul version on website to 1.7.1 2020-02-23 14:04:20 -08:00
Luke Kysow ca6ba769ff
Merge pull request #7207 from hashicorp/namespace-k8s-docs
Docs for consul-k8s namespaces support
2020-02-21 14:05:38 -07:00
John Cowen 74ade640e3
ui: Use WithEventSource mixin on intentions to ensure cleanup (#7333)
The WithEventSource mixin has a reset method when the Controller is
exited which will close any open EventSources/Blocking queries.

This adds it in for intentions
2020-02-21 14:00:33 +00:00
John Cowen 88b69da4c5
Update CHANGELOG.md 2020-02-21 12:01:32 +00:00
Hans Hasselberg 10daa79bbd
Update CHANGELOG.md 2020-02-20 22:28:50 +01:00
Luke Kysow 01e30289d2
Docs for Consul namespaces in kube 2020-02-20 14:27:09 -07:00
Freddy 9be9990646
Update CHANGELOG.md 2020-02-20 09:59:33 -07:00
Jono Sosulska f5920e4832
Merge pull request #7304 from hashicorp/docs/anti-entropy
Added links to Anti-entropy guide + catalog
2020-02-20 11:16:13 -05:00
Hans Hasselberg e05ac57e8f
tls: support tls 1.3 (#7325) 2020-02-19 23:22:31 +01:00
R.B. Boyer fd7e87e551 format changelog 2020-02-19 15:13:42 -06:00
Matt Keeler ce03db9811
Update CHANGELOG.md 2020-02-19 14:42:48 -05:00
Matt Keeler 861f754dad
Properly detect no alt domain set (#7323) 2020-02-19 14:41:43 -05:00
Matt Keeler ae424f25e6
Update CHANGELOG.md 2020-02-19 11:59:18 -05:00
Matt Keeler 4c9577678e
xDS Mesh Gateway Resolver Subset Fixes (#7294)
* xDS Mesh Gateway Resolver Subset Fixes

The first fix was that clusters were being generated for every service resolver subset regardless of there being any service instances of the associated service in that dc. The previous logic didn’t care at all but now it will omit generating those clusters unless we also have service instances that should be proxied.

The second fix was to respect the DefaultSubset of a service resolver so that mesh-gateways would configure the endpoints of the unnamed subset cluster to only those endpoints matched by the default subsets filters.

* Refactor the gateway endpoint generation to be a little easier to read
2020-02-19 11:57:55 -05:00
Hans Hasselberg 8f61558e19
hashibot: let hashibot help us more (#7281) 2020-02-19 15:30:27 +01:00
Mike Morris 3532c56c5f Update config.yml 2020-02-18 13:27:48 -05:00
kaitlincarter-hc 707e06e3fe
docs: adding new guide for namespaces and service discovery (#6788) 2020-02-18 18:34:21 +01:00
kaitlincarter-hc e8bbd00c38
docs: setup secure namespaces (#6789)
* Adding new guide for namespaces and ACLs

* Update website/source/docs/guides/secure-namespaces.html.md

Co-Authored-By: Blake Covarrubias <bcovarrubias@hashicorp.com>

Co-authored-by: Hans Hasselberg <me@hans.io>
Co-authored-by: Blake Covarrubias <blake.covarrubias@gmail.com>
2020-02-18 18:33:35 +01:00
John Cowen e85fd18f89
ui: Be more specific with the display toggling checkboxes (#7309) 2020-02-18 17:05:45 +00:00
Matt Keeler 40614b16fd
Update CHANGELOG.md 2020-02-18 11:16:35 -05:00
Matt Keeler 38b6ffb230
Update CHANGELOG.md 2020-02-18 11:14:33 -05:00
rerorero 2630a949f7
fix: Destroying a session that doesn't exist returns status cod… (#6905)
fix #6840
2020-02-18 11:13:15 -05:00
Matt Keeler 096326d2b3
Update CHANGELOG.md 2020-02-18 11:12:45 -05:00
Wim 3a2c865ff6
Fix high cpu usage with IPv6 recursor address. Closes #6120 (#6128) 2020-02-18 11:09:11 -05:00