Previously the tokens would fail to insert into the secondary's state
store because the AuthMethod field of the ACLToken did not point to a
known auth method from the primary.
* create consul version metric with version label
* agent/agent.go: add pre-release Version as well as label
Co-Authored-By: Radha13 <kumari.radha3@gmail.com>
* verion and pre-release version labels.
* hyphen/- breaks prometheus
* Add Prometheus gauge defintion for version metric
* Add new metric to telemetry docs
Co-authored-by: Radha Kumari <kumari.radha3@gmail.com>
Co-authored-by: Aestek <thib.gilles@gmail.com>
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Add a skip condition to all tests slower than 100ms.
This change was made using `gotestsum tool slowest` with data from the
last 3 CI runs of master.
See https://github.com/gotestyourself/gotestsum#finding-and-skipping-slow-tests
With this change:
```
$ time go test -count=1 -short ./agent
ok github.com/hashicorp/consul/agent 0.743s
real 0m4.791s
$ time go test -count=1 -short ./agent/consul
ok github.com/hashicorp/consul/agent/consul 4.229s
real 0m8.769s
```
* server: fix panic when deleting a non existent intention
* add changelog
* Always return an error when deleting non-existent ixn
Co-authored-by: freddygv <gh@freddygv.xyz>
And remove the devMode field from builder.
This change helps make the Builder state more explicit by moving inputs to the BuilderOps struct,
leaving only fields that can change during Builder.Build on the Builder struct.
Using the LiteralSource makes it much easier to find default values, because an IDE reports
the location of a default. With an HCL string they are harder to discover.
Also removes unnecessary mapstructure.Decodes of constant values.
A vulnerability was identified in Consul and Consul Enterprise (“Consul”) such that operators with `operator:read` ACL permissions are able to read the Consul Connect CA configuration when explicitly configured with the `/v1/connect/ca/configuration` endpoint, including the private key. This allows the user to effectively privilege escalate by enabling the ability to mint certificates for any Consul Connect services. This would potentially allow them to masquerade (receive/send traffic) as any service in the mesh.
--
This PR increases the permissions required to read the Connect CA's private key when it was configured via the `/connect/ca/configuration` endpoint. They are now `operator:write`.
Previously the listener was being passed to a closure in a loop without
capturing the loop variable. The result is only the last listener is
used, so the http/https servers only listen on one address.
This problem is fixed by capturing the variable by passing it into a
function.
This PR updates the tags that we generate for Envoy stats.
Several of these come with breaking changes, since we can't keep two stats prefixes for a filter.
* ci: stop building darwin/386 binaries
Go 1.15 drops support for 32-bit binaries on Darwin https://golang.org/doc/go1.15#darwin
* tls: ConnectionState::NegotiatedProtocolIsMutual is deprecated in Go 1.15, this value is always true
* correct error messages that changed slightly
* Completely regenerate some TLS test data
Co-authored-by: R.B. Boyer <rb@hashicorp.com>
The Intention.Apply RPC is quite large, so this PR attempts to break it down into smaller functions and dissolves the pre-config-entry approach to the breakdown as it only confused things.
Header is: X-Consul-Default-ACL-Policy=<allow|deny>
This is of particular utility when fetching matching intentions, as the
fallthrough for a request that doesn't match any intentions is to
enforce using the default acl policy.
Most packages should pass the race detector. An exclude list ensures
that new packages are automatically tested with -race.
Also fix a couple small test races to allow more packages to be tested.
Returning readyCh requires a lock because it can be set to nil, and
setting it to nil will race without the lock.
Move the TestServer.Listening calls around so that they properly guard
setting TestServer.l. Otherwise it races.
Remove t.Parallel in a small package. The entire package tests run in a
few seconds, so t.Parallel does very little.
In auto-config, wait for the AutoConfig.run goroutine to stop before
calling readPersistedAutoConfig. Without this change there was a data
race on reading ac.config.
defaultMetrics was being set at package import time, which meant that it received an instance of
the original default. But lib/telemetry.InitTelemetry sets a new global when it is called.
This resulted in the metrics being sent nowhere.
This commit changes defaultMetrics to be a function, so it will return the global instance when
called. Since it is called after InitTelemetry it will return the correct metrics instance.
The Catalog, Config Entry, KV and Session resources potentially re-validate the input as its coming in. We need to prevent snapshot restoration failures due to missing namespaces or namespaces that are being deleted in enterprise.
This ensures the metrics proxy endpoint is ACL protected behind a
wildcard `service:read` and `node:read` set of rules. For Consul
Enterprise these will need to span all namespaces:
```
service_prefix "" { policy = "read" }
node_prefix "" { policy = "read" }
namespace_prefix "" {
service_prefix "" { policy = "read" }
node_prefix "" { policy = "read" }
}
```
This PR contains just the backend changes. The frontend changes to
actually pass the consul token header to the proxy through the JS plugin
will come in another PR.
Added a new option `ui_config.metrics_proxy.path_allowlist`. This defaults to `["/api/v1/query", "/api/v1/query_range"]` when the metrics provider is set to `prometheus`.
Requests that do not use one of the allow-listed paths (via exact match) get a 403 Forbidden response instead.
1. do a state store query to list intentions as the agent would do over in `agent/proxycfg` backing `agent/xds`
2. upgrade the database and do a fresh `service-intentions` config entry write
3. the blocking query inside of the agent cache in (1) doesn't notice (2)
Makes Payload a type with FilterByKey so that Payloads can implement
filtering by key. With this approach we don't need to expose a Namespace
field on Event, and we don't need to invest micro formats or require a
bunch of code to be aware of exactly how the key field is encoded.
The output of the previous assertions made it impossible to debug the tests without code changes.
With go-cmp comparing the entire slice we can see the full diffs making it easier to debug failures.
Gauge metrics are great for understanding the current state, but can somtimes hide problems
if there are many disconnect/reconnects.
This commit adds counter metrics for connections and streams to make it easier to see the
count of newly created connections and streams.
Instead of using retry.Run, which appears to have problems in some cases where it does not
emit an error message, use a for loop.
Increase the number of attempts and remove any sleep, since this operation is not that expensive to do
in a tight loop
Closing l.conns can lead to a race and a 'panic: send on closed chan' when a
connection is in the middle of being handled when the server is shutting down.
Found using '-race -count=800'
* Plumb Datacenter and Namespace to metrics provider in preparation for them being usable.
* Move metrics loader/status to a new component and show reason for being disabled.
* Remove stray console.log
* Rebuild AssetFS to resolve conflicts
* Yarn upgrade
* mend
Previously config entries sharing a kind & name but in different
namespaces could occasionally cause "stuck states" in replication
because the namespace fields were ignored during the differential
comparison phase.
Example:
Two config entries written to the primary:
kind=A,name=web,namespace=bar
kind=A,name=web,namespace=foo
Under the covers these both get saved to memdb, so they are sorted by
all 3 components (kind,name,namespace) during natural iteration. This
means that before the replication code does it's own incomplete sort,
the underlying data IS sorted by namespace ascending (bar comes before
foo).
After one pass of replication the primary and secondary datacenters have
the same set of config entries present. If
"kind=A,name=web,namespace=bar" were to be deleted, then things get
weird. Before replication the two sides look like:
primary: [
kind=A,name=web,namespace=foo
]
secondary: [
kind=A,name=web,namespace=bar
kind=A,name=web,namespace=foo
]
The differential comparison phase walks these two lists in sorted order
and first compares "kind=A,name=web,namespace=foo" vs
"kind=A,name=web,namespace=bar" and falsely determines they are the SAME
and are thus cause an update of "kind=A,name=web,namespace=foo". Then it
compares "<nothing>" with "kind=A,name=web,namespace=foo" and falsely
determines that the latter should be DELETED.
During reconciliation the deletes are processed before updates, and so
for a brief moment in the secondary "kind=A,name=web,namespace=foo" is
erroneously deleted and then immediately restored.
Unfortunately after this replication phase the final state is identical
to the initial state, so when it loops around again (rate limited) it
repeats the same set of operations indefinitely.
Required also converting some of the transaction functions to WriteTxn
because TxnRO() called the same helper as TxnRW.
This change allows us to return a memdb.Txn for read-only txn instead of
wrapping them with state.txn.
Prepopulate was setting entry.Expiry.HeapIndex to 0. Previously this would result in a call to heap.Fix(0)
which wasn't correct, but was also not really a problem because at worse it would re-notify.
With the recent change to extract cachettl it was changed to call Update(idx), which would have updated
the wrong entry.
A previous commit removed the setting of entry.Expiry so that the HeapIndex would be reported
as -1, and this commit adds a test and handles the -1 heap index.
And remove duplicate notifications.
Instead of performing the check in the heap implementation, check the
index in the higher level interface (Add,Remove,Update) and notify if one
of the relevant indexes is 0.
Instead of the whole map. This should save a lot of time performing reflecting on a large map.
The filter does not change, so there is no reason to re-apply it to older entries.
This change attempts to make the delay logic more obvious by:
* remove indirection, inline a bunch of function calls
* move all the code and constants next to each other
* replace the two constant values with a single value
* reword the comments.
When a service is deregistered, we check whever matching services were
registered as sidecar along with it and deregister them as well.
To determine if a service is indeed a sidecar we check the
structs.ServiceNode.LocallyRegisteredAsSidecar property. However, to
avoid interal API leakage, it is excluded from JSON serialization,
meaning it is not persisted to disk either.
When the agent is restarted, this property lost and sidecars are no
longer deregistered along with their parent service.
To fix this, we now specifically save this property in the persisted
service file.
* Consul Service meta wrongly computes and exposes non_voter meta
In Serf Tags, entreprise members being non-voters use the tag
`nonvoter=1`, not `non_voter = false`, so non-voters in members
were wrongly displayed as voter.
Demonstration:
```
consul members -detailed|grep voter
consul20-hk5 10.200.100.110:8301 alive acls=1,build=1.8.4+ent,dc=hk5,expect=3,ft_fs=1,ft_ns=1,id=xxxxxxxx-5629-08f2-3a79-10a1ab3849d5,nonvoter=1,port=8300,raft_vsn=3,role=consul,segment=<all>,use_tls=1,vsn=2,vsn_max=3,vsn_min=2,wan_join_port=8302
```
* Added changelog
* Added changelog entry
* Remove unused StatsCard component
* Create Card and Stats contextual components with styling
* Send endpoint, item, and protocol to Stats as props
* WIP basic plumbing for metrics in Ember
* WIP metrics data source now works for different protocols and produces reasonable mock responses
* WIP sparkline component
* Mostly working metrics and graphs in topology
* Fix date in tooltip to actually be correct
* Clean up console.log
* Add loading frame and create a style sheet for Stats
* Various polish fixes:
- Loading state for graph
- Added fake latency cookie value to test loading
- If metrics provider has no series/stats for the service show something that doesn't look broken
- Graph hover works right to the edge now
- Stats boxes now wrap so they are either shown or not as will fit not cut off
- Graph resizes when browser window size changes
- Some tweaks to number formats and stat metrics to make them more compact/useful
* Thread Protocol through topology model correctly
* Rebuild assetfs
* Fix failing tests and remove stats-card now it's changed and become different
* Fix merge conflict
* Update api doublt
* more merge fixes
* Add data-permission and id attr to Card
* Run JS linter
* Move things around so the tests run with everything available
* Get tests passing:
1. Remove fakeLatency setTimeout (will be replaced with CONSUL_LATENCY
in mocks)
2. Make sure any event handlers are removed
* Make sure the Consul/scripts are available before the app
* Make sure interval gets set if there is no cookie value
* Upgrade mocks so we can use CONSUL_LATENCY
* Fix handling of no series values from Prometheus
* Update assetfs and fix a comment
* Rebase and rebuild assetfs; fix tcp metric series units to be bits not bytes
* Rebuild assetfs
* Hide stats when provider is not configured
Co-authored-by: kenia <keniavalladarez@gmail.com>
Co-authored-by: John Cowen <jcowen@hashicorp.com>
Previously, we would emit service usage metrics both with and without a
namespace label attached. This is problematic in the case when you want
to aggregate metrics together, i.e. "sum(consul.state.services)". This
would cause services to be counted twice in that aggregate, once via the
metric emitted with a namespace label, and once in the metric emited
without any namespace label.
This is the recommended proxy integration API for listing intentions
which should not require an active connection to the servers to resolve
after the initial cache filling.