When the Consul serf health check is failing, this means that the health checks registered with the agent may no longer be correct. Therefore we show a notice to the user when we detect that the serf health check is failing both for the health check listing for nodes and for service instances.
There were a few little things we fixed up whilst we were here:
- We use our @replace decorator to replace an empty Type with serf in the model.
- We noticed that ServiceTags can be null, so we replace that with an empty array.
- We added docs for both our Notice component and the Consul::HealthCheck::List component. Notice now defaults to @type=info.
* Save exposed HTTP or GRPC ports to the agent's store
* Add those the health checks API so we can retrieve them from the API
* Change redirect-traffic command to also exclude those ports from inbound traffic redirection when expose.checks is set to true.
* Add conditionals to Lock Session list items
* Add changelog
* Show ID in details if there is a name to go in title
* Add copy-button if ID is in the title
* Update TTL conditional
* Update .changelog/10121.txt
Co-authored-by: John Cowen <johncowen@users.noreply.github.com>
Co-authored-by: John Cowen <johncowen@users.noreply.github.com>
This fixes the spacing bug in nspaces only by only showing Description if the namespace has one, and removing the extra 2 pixel margin of dds for when dts aren't rendered/don't exist.
* ui: Add support for showing partial lists in ListCollection
* Add CSS for partial 'View more' button, and move all CSS to /components
* Enable partial view for intention permissions
* ui: Loader amends/improvements
1. Create a JS compatible template only 'glimmer' component so we can
use it with or without glimmer.
2. Add a set of `rose` colors.
3. Animate the brand loader to keep it centered when the side
navigation appears.
4. Tweak the color of Consul::Loader to use a 'rose' color.
5. Move everything loader related to the `app/components/` folder and
add docs.
A recent change in 1.9.x inverted the order of these two lines, which caused the
X-Consul-Effective-Consistency header to be missing for the servie health endpoints
* ui: Fix text search for upstream instances
* Clean up predicates for other model types
* Add some docs around DataCollection and searching
* Enable UI Engineering Docs for our preview sites
* Use debug CSS in dev and staging
* WIP reloadable raft config
* Pre-define new raft gauges
* Update go-metrics to change gauge reset behaviour
* Update raft to pull in new metric and reloadable config
* Add snapshot persistance timing and installSnapshot to our 'protected' list as they can be infrequent but are important
* Update telemetry docs
* Update config and telemetry docs
* Add note to oldestLogAge on when it is visible
* Add changelog entry
* Update website/content/docs/agent/options.mdx
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
* Give descriptive error if auth method not found
Previously during a `consul login -method=blah`, if the auth method was not found, the
error returned would be "ACL not found". This is potentially confusing
because there may be many different ACLs involved in a login: the ACL of
the Consul client, perhaps the binding rule or the auth method.
Now the error will be "auth method blah not found", which is much easier
to debug.
Initially we were loading every potential upstream address into Envoy
and then routing traffic to the logical upstream service. The downside
of this behavior is that traffic meant to go to a specific instance
would be load balanced across ALL instances.
Traffic to specific instance IPs should be forwarded to the original
destination and if it's a destination in the mesh then we should ensure
the appropriate certificates are used.
This PR makes transparent proxying a Kubernetes-only feature for now
since support for other environments requires generating virtual IPs,
and Consul does not do that at the moment.
* Update header logo and inline icon
* Update full logos + layout on loading screen
* Update favicon assets and strategy
- Switches to serve an ico file alongside an SVG file
- Introduces an apple-touch-icon
* Removes unused favicon/meta assets
* Changelog item for ui
* Create component for logo
* Simplify logo component, set brand color
* Fix docs loading state CSS issue
The only thing that needed fixing up pertained to this section of the 1.18.x release notes:
> grpc_stats: the default value for stats_for_all_methods is switched from true to false, in order to avoid possible memory exhaustion due to an untrusted downstream sending a large number of unique method names. The previous default value was deprecated in version 1.14.0. This only changes the behavior when the value is not set. The previous behavior can be used by setting the value to true. This behavior change by be overridden by setting runtime feature envoy.deprecated_features.grpc_stats_filter_enable_stats_for_all_methods_by_default.
For now to maintain status-quo I'm explicitly setting `stats_for_all_methods=true` in all versions to avoid relying upon the default.
Additionally the naming of the emitted metrics for these gRPC requests changed slightly so the integration test assertions for `case-grpc` needed adjusting.
This ensures that if someone does include some extension Consul does not currently make use of, that extension is actually usable. Without linking these envoy protobufs into the main binary it can't round trip the escape hatches to send them down to envoy.
Whenenver the go-control-plane library is upgraded next we just have to re-run 'make envoy-library'.
This adds support for the Incremental xDS protocol when using xDS v3. This is best reviewed commit-by-commit and will not be squashed when merged.
Union of all commit messages follows to give an overarching summary:
xds: exclusively support incremental xDS when using xDS v3
Attempts to use SoTW via v3 will fail, much like attempts to use incremental via v2 will fail.
Work around a strange older envoy behavior involving empty CDS responses over incremental xDS.
xds: various cleanups and refactors that don't strictly concern the addition of incremental xDS support
Dissolve the connectionInfo struct in favor of per-connection ResourceGenerators instead.
Do a better job of ensuring the xds code uses a well configured logger that accurately describes the connected client.
xds: pull out checkStreamACLs method in advance of a later commit
xds: rewrite SoTW xDS protocol tests to use protobufs rather than hand-rolled json strings
In the test we very lightly reuse some of the more boring protobuf construction helper code that is also technically under test. The important thing of the protocol tests is testing the protocol. The actual inputs and outputs are largely already handled by the xds golden output tests now so these protocol tests don't have to do double-duty.
This also updates the SoTW protocol test to exclusively use xDS v2 which is the only variant of SoTW that will be supported in Consul 1.10.
xds: default xds.Server.AuthCheckFrequency at use-time instead of construction-time
* Use proxy outbound port from TransparentProxyConfig if provided
* If -proxy-id is provided to the redirect-traffic command, exclude any listener ports
from inbound traffic redirection. This includes envoy_prometheus_bind_addr,
envoy_stats_bind_addr, and the ListenerPort from the Expose configuration.
* Allow users to provide additional inbound and outbound ports, outbound CIDRs
and additional user IDs to be excluded from traffic redirection.
This affects both the traffic-redirect command and the iptables SDK package.
This config entry is being renamed primarily because in k8s the name
cluster could be confusing given that the config entry applies across
federated datacenters.
Additionally, this config entry will only apply to Consul as a service
mesh, so the more generic "cluster" name is not needed.
* CLI: Add support for reading internal raft snapshots to snapshot inspect
* Add snapshot inspect test for raw state files
* Add changelog entry
* Update .changelog/10089.txt
The extra argument meant that the blocking query configuration wasn't
being read properly, and therefore the correct ?index wasn't being sent
with the request.
Previously only a single auth method would be saved to the snapshot. This commit fixes the typo
and adds to the test, to show that all auth methods are now saved.
* add http2 ping checks
* fix test issue
* add h2ping check to config resources
* add new test and docs for h2ping
* fix grammatical inconsistency in H2PING documentation
* resolve rebase conflicts, add test for h2ping tls verification failure
* api documentation for h2ping
* update test config data with H2PING
* add H2PING to protocol buffers and update changelog
* fix typo in changelog entry
* Add new consul connect redirect-traffic command for applying traffic redirection rules when Transparent Proxy is enabled.
* Add new iptables package for applying traffic redirection rules with iptables.
* Fix bug in cache where TTLs are effectively ignored
This mostly affects streaming since streaming will immediately return from Fetch calls when the state is Closed on eviction which causes the race condition every time.
However this also affects all other cache types if the fetch call happens to return between the eviction and then next time around the Get loop by any client.
There is a separate bug that allows cache items to be evicted even when there are active clients which is the trigger here.
* Add changelog entry
* Update .changelog/9978.txt
The streaming cache type for service health has no way to handle v1/health/ingress/:service queries as there is no equivalent topic that would return the appropriate data.
Ensure that attempts to use this endpoint will use the old cache-type for now so that they return appropriate data when streaming is enabled.
* Allow passing ALPN next protocols down to connect services. Fixes#4466.
* Update connect/proxy/proxy_test.go
Co-authored-by: Paul Banks <banks@banksco.de>
Co-authored-by: Paul Banks <banks@banksco.de>
This PR adds support for setting QueryOptions on a few agent API
endpoints. Nomad needs to be able to set the Namespace field on
these endpoints to:
- query for services / checks in a namespace
- deregister services / checks in a namespace
- update TTL status on checks in a namespace
* Configure ember-auto-import so we can use a stricter CSP
* Create a fake filesystem using JSON to avoid inline scripts in index
We used to have inline scripts in index.html in order to support embers
filepath fingerprinting and our configurable rootURL.
Instead of using inline scripts we use application/json plus a JSON blob
to create a fake filesystem JSON blob/hash/map to hold all of the
rootURL'ed fingerprinted file paths which we can then retrive later in
non-inline scripts.
We move our inlined polyfills script into the init.js external script,
and we move the CodeMirror syntax highlighting configuration inline
script into the main app itself - into the already existing CodeMirror
initializer (this has been moved so we can lookup a service located
document using ember's DI container)
* Set a strict-ish CSP policy during development
AutopilotServerHealthy now handles the 429 status code
Previously we would error out and not parse the response. Now either a 200 or 429 status code are considered expected statuses and will result in the method returning the reply allowing API consumers to not only see if the system is healthy or not but which server is unhealthy.
This PR uses the excellent a11y-dialog to implement our modal functionality across the UI.
This package covers all our a11y needs - overlay click and ESC to close, controlling aria-* attributes, focus trap and restore. It's also very small (1.6kb) and has good DOM and JS APIs and also seems to be widely used and well tested.
There is one downside to using this, and that is:
We made use of a very handy characteristic of the relationship between HTML labels and inputs in order to implement our modals previously. Adding a for="id" attribute to a label meant you can control an <input id="id" /> from anywhere else in the page without having to pass javascript objects around. It's just based on using the same string for the for attribute and the id attribute. This allowed us to easily open our login dialog with CSS from anywhere within the UI without having to manage passing around a javascript object/function/method in order to open the dialog.
We've PRed #9813 which includes an approach which would make passing around JS modal object easier to do. But in the meantime we've added a little 'hack' here using an additional <input /> element and a change listener which allows us to keep this label/input characteristic of our old modals. I'd originally thought this would be a temporary amend in order to wait on #9813 but the more I think about it, the more I think its quite a nice thing to keep - so longer term we may/may not keep this.
Allows setting -prometheus-backend-port to configure the cluster
envoy_prometheus_bind_addr points to.
Allows setting -prometheus-scrape-path to configure which path
envoy_prometheus_bind_addr exposes metrics on.
-prometheus-backend-port is used by the consul-k8s metrics merging feature, to
configure envoy_prometheus_bind_addr to point to the merged metrics
endpoint that combines Envoy and service metrics so that one set of
annotations on a Pod can scrape metrics from the service and it's Envoy
sidecar.
-prometheus-scrape-path is used to allow configurability of the path
where prometheus metrics are exposed on envoy_prometheus_bind_addr.
Previous to this commit, the API response would include Gateway
Addresses in the form `domain.name.:8080`, which due to the addition of
the port is probably not the expected response.
This commit rightTrims any `.` characters from the end of the domain
before formatting the address to include the port resulting in
`domain.name:8080`
Note that this does NOT upgrade to xDS v3. That will come in a future PR.
Additionally:
- Ignored staticcheck warnings about how github.com/golang/protobuf is deprecated.
- Shuffled some agent/xds imports in advance of a later xDS v3 upgrade.
- Remove support for envoy 1.13.x but don't add in 1.17.x yet. We have to wait until the xDS v3 support is added in a follow-up PR.
Fixes#8425
When de-registering in anti-entropy sync, when there is no service or
check token.
The agent token will fall back to the default (aka user) token if no agent
token is set, so the existing behaviour still works, but it will prefer
the agent token over the user token if both are set.
ref: https://www.consul.io/docs/agent/options#acl_tokens
The agent token seems more approrpiate in this case, since this is an
"internal operation", not something initiated by the user.
This commit use the internal authorize endpoint along wiht ember-can to further restrict user access to certain UI features and navigational elements depending on the users ACL token
* A GET of the /acl/auth-method/:name endpoint returns the fields
MaxTokenTTL and TokenLocality, while a LIST (/acl/auth-methods) does
not.
The list command returns a filtered subset of the full set. This is
somewhat deliberate, so that secrets aren't shown, but the TTL and
Locality fields aren't (IMO) security critical, and it is useful for
the front end to be able to show them.
For consistency these changes mirror the 'omit empty' and string
representation choices made for the GET call.
This includes changes to the gRPC and API code in the client.
The new output looks similar to this
curl 'http://localhost:8500/v1/acl/auth-methods' | jq '.'
{
"MaxTokenTTL": "8m20s",
"Name": "minikube-ttl-local2",
"Type": "kubernetes",
"Description": "minikube auth method",
"TokenLocality": "local",
"CreateIndex": 530,
"ModifyIndex": 530,
"Namespace": "default"
}
]
Signed-off-by: Mark Anderson <manderson@hashicorp.com>
* Add changelog
Signed-off-by: Mark Anderson <manderson@hashicorp.com>
Previously a snapshot created as part of a resumse-stream request could have incorrectly
cached the newSnapshotToFollow event. This would cause clients to error because they
received an unexpected framing event.
This fixes an issue where leaf certificates issued in primary
datacenters using Vault as a Connect CA would be reissued very
frequently (every ~20 seconds) because the logic meant to detect root
rotation was errantly triggering.
The hash of the rootCA was being compared against a hash of the
intermediateCA and always failing. This doesn't apply to the Consul
built-in CA provider because there is no intermediate in use in the
primary DC.
This is reminiscent of #6513
In a situation where the mesh gateway is configured to bind to multiple
network interfaces, we use a feature called 'tagged addresses'.
Sometimes an address is duplicated across multiple tags such as 'lan'
and 'lan_ipv4'.
There is code to deduplicate these things when creating envoy listeners,
but that code doesn't ensure that the same tag wins every time. If the
winning tag flaps between xDS discovery requests it will cause the
listener to be drained and replaced.
* CSS for moving from a horizontal main menu to a side/vertical one
* Add <App /> Component and rearrange <HashcorpConsul /> to use it
1. HashicorpConsul now uses <App />
2. <App /> is now translated and adds 'skip to main content' functionality
3. Adds ember-in-viewport addon in order to visibly hide main navigation
items in order to take them out of focus/tabbing
4. Slight amends to the dom service while I was there
This way we only have to wait for the serf barrier to pass once before
we can make use of federation state APIs Without this patch every
restart needs to re-compute the change.
Adds a 'status' for the filtering/searching in the UI, without this its not super clear that you are filtering a recordset due to the menu selections being hidden once closed. You can also use the pills in this status view to delete individual filters.
* Add templating to inject JSON into an application/json script tag
Plus an external script in order to pick it out and inject the values we
need injecting into ember's environment meta tag.
The UI still uses env style naming (CONSUL_*) but we uses the new style
JSON/golang props behind the scenes.
Co-authored-by: Paul Banks <banks@banksco.de>
After fixing that bug I uncovered a couple more:
Fix an issue where we might try to cross sign a cert when we never had a valid root.
Fix a potential issue where reconfiguring the CA could cause either the Vault or AWS PCA CA providers to delete resources that are still required by the new incarnation of the CA.
* ui: Keep track of existing intentions and use those to save changes
Previously we risked overwriting existing data in an intention if we
tried to save an intention without having loaded it first, for example
Description and Metadata would have been overwritten.
This change loads in all the intentions for an origin service so we can
pick off the one we need to save and change to ensure that we don't
overwrite any existing data.
The field was not being included in the cache info key. This would result in a DNS request for
web.service.consul returning the same result as web.ingress.consul, when those results should
not be the same.
* Fix bug in usage metrics that caused a negative count to occur
There were a couple of instances were usage metrics would do the wrong
thing and result in incorrect counts, causing the count to attempt to
decrement below zero and return an error. The usage metrics did not
account for various places where a single transaction could
delete/update/add multiple service instances at once.
We also remove the error when attempting to decrement below zero, and
instead just make sure we do not accidentally underflow the unsigned
integer. This is a more graceful failure than returning an error and not
allowing a transaction to commit.
* Add changelog
This PR is based on the previous work by @snuggie12 in PR #6825. It adds the command consul intention list to list all available intentions. The list functionality for intentions seems a bit overdue as it's just very handy. The web UI cannot list intentions outside of the default namespace, and using the API is sometimes not the friendliest option. ;)
I cherry picked snuggie12's commits who did most of the heavy lifting (thanks again @snuggie12 for your great work!). The changes in the original commit mostly still worked on the current HEAD. On top of that I added support for namespaces and fixed the docs as they are managed differently today. Also the requested changes related to the "Connect" references in the original PRs have been addressed.
Fixes#5652
Co-authored-by: Matt Hoey <mhoey05@jcu.edu>
* Display a warning when rpc.enable_streaming = true is set on a client
This option has no effect when running as an agent
* Added warning when server starts with use_streaming_backend but without rpc.enable_streaming
* Added unit test
This way we only have to wait for the serf barrier to pass once before
we can upgrade to v2 acls. Without this patch every restart needs to
re-compute the change, and potentially if a stray older node joins after
a migration it might regress back to v1 mode which would be problematic.
This PR adds the ns=* query parameter when namespaces are enabled to keep backwards compatibility with how the UI used to work (Intentions page always lists all intention across all namespace you have access to)
I found a tiny dev bug for printing out the current URL during acceptance testing and fixed that up while I was there.
Nodes themselves are not namespaced, so we'd originally assumed we did not need to pass through the ns query parameter when listing or viewing nodes.
As it turns out the API endpoints we use to list and view nodes (and related things) return things that are namespaced, therefore any API requests for nodes do require a the ns query parameter to be passed through to the request.
This PR adds the necessary ns query param to all things Node, apart from the querying for the leader which only returns node related information.
Additionally here we decided to show 0 Services text in the node listing if there are nodes with no service instances within the namespace you are viewing, as this is clearer than showing nothing at all. We also cleaned up/standardized the text we use to in the empty state for service instances.
Previously the tokens would fail to insert into the secondary's state
store because the AuthMethod field of the ACLToken did not point to a
known auth method from the primary.
* server: fix panic when deleting a non existent intention
* add changelog
* Always return an error when deleting non-existent ixn
Co-authored-by: freddygv <gh@freddygv.xyz>
A vulnerability was identified in Consul and Consul Enterprise (“Consul”) such that operators with `operator:read` ACL permissions are able to read the Consul Connect CA configuration when explicitly configured with the `/v1/connect/ca/configuration` endpoint, including the private key. This allows the user to effectively privilege escalate by enabling the ability to mint certificates for any Consul Connect services. This would potentially allow them to masquerade (receive/send traffic) as any service in the mesh.
--
This PR increases the permissions required to read the Connect CA's private key when it was configured via the `/connect/ca/configuration` endpoint. They are now `operator:write`.
This PR updates the tags that we generate for Envoy stats.
Several of these come with breaking changes, since we can't keep two stats prefixes for a filter.
The Intention.Apply RPC is quite large, so this PR attempts to break it down into smaller functions and dissolves the pre-config-entry approach to the breakdown as it only confused things.
Header is: X-Consul-Default-ACL-Policy=<allow|deny>
This is of particular utility when fetching matching intentions, as the
fallthrough for a request that doesn't match any intentions is to
enforce using the default acl policy.
The Catalog, Config Entry, KV and Session resources potentially re-validate the input as its coming in. We need to prevent snapshot restoration failures due to missing namespaces or namespaces that are being deleted in enterprise.
Previously config entries sharing a kind & name but in different
namespaces could occasionally cause "stuck states" in replication
because the namespace fields were ignored during the differential
comparison phase.
Example:
Two config entries written to the primary:
kind=A,name=web,namespace=bar
kind=A,name=web,namespace=foo
Under the covers these both get saved to memdb, so they are sorted by
all 3 components (kind,name,namespace) during natural iteration. This
means that before the replication code does it's own incomplete sort,
the underlying data IS sorted by namespace ascending (bar comes before
foo).
After one pass of replication the primary and secondary datacenters have
the same set of config entries present. If
"kind=A,name=web,namespace=bar" were to be deleted, then things get
weird. Before replication the two sides look like:
primary: [
kind=A,name=web,namespace=foo
]
secondary: [
kind=A,name=web,namespace=bar
kind=A,name=web,namespace=foo
]
The differential comparison phase walks these two lists in sorted order
and first compares "kind=A,name=web,namespace=foo" vs
"kind=A,name=web,namespace=bar" and falsely determines they are the SAME
and are thus cause an update of "kind=A,name=web,namespace=foo". Then it
compares "<nothing>" with "kind=A,name=web,namespace=foo" and falsely
determines that the latter should be DELETED.
During reconciliation the deletes are processed before updates, and so
for a brief moment in the secondary "kind=A,name=web,namespace=foo" is
erroneously deleted and then immediately restored.
Unfortunately after this replication phase the final state is identical
to the initial state, so when it loops around again (rate limited) it
repeats the same set of operations indefinitely.