consul/website/source/docs/internals/acl.html.markdown
2016-12-14 21:31:59 -08:00

624 lines
27 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
layout: "docs"
page_title: "ACL System"
sidebar_current: "docs-internals-acl"
description: |-
Consul provides an optional Access Control List (ACL) system which can be used to control access to data and APIs. The ACL system is a Capability-based system that relies on tokens which can have fine grained rules applied to them. It is very similar to AWS IAM in many ways.
---
# ACL System
Consul provides an optional Access Control List (ACL) system which can be used to control
access to data and APIs. The ACL is
[Capability-based](https://en.wikipedia.org/wiki/Capability-based_security), relying
on tokens to which fine grained rules can be applied. It is very similar to
[AWS IAM](http://aws.amazon.com/iam/) in many ways.
## Scope
When the ACL system was launched in Consul 0.4, it was only possible to specify
policies for the KV store. In Consul 0.5, ACL policies were extended to service
registrations. In Consul 0.6, ACL's were further extended to restrict service
discovery mechanisms, user events, and encryption keyring operations.
## ACL Design
The ACL system is designed to be easy to use, fast to enforce, and flexible to new
policies, all while providing administrative insight.
Every token has an ID, name, type, and rule set. The ID is a randomly generated
UUID, making it unfeasible to guess. The name is opaque to Consul and human readable.
The type is either "client" (meaning the token cannot modify ACL rules) or "management"
(meaning the token is allowed to perform all actions).
The token ID is passed along with each RPC request to the servers. Agents
can be configured with an [`acl_token`](/docs/agent/options.html#acl_token) property
to provide a default token, but the token can also be specified by a client on a
[per-request basis](/docs/agent/http.html). ACLs were added in Consul 0.4, meaning
prior versions do not provide a token. This is handled by the special "anonymous"
token. If no token is provided, the rules associated with the anonymous token are
automatically applied: this allows policy to be enforced on legacy clients.
ACLs can also act in either a whitelist or blacklist mode depending
on the configuration of
[`acl_default_policy`](/docs/agent/options.html#acl_default_policy). If the
default policy is to deny all actions, then token rules can be set to whitelist
specific actions. In the inverse, the allow all default behavior is a blacklist
where rules are used to prohibit actions. By default, Consul will allow all
actions.
#### ACL Datacenter
Enforcement is always done by the server nodes. All servers must be configured
to provide an [`acl_datacenter`](/docs/agent/options.html#acl_datacenter) which
enables ACL enforcement but also specifies the authoritative datacenter. Consul
relies on [RPC forwarding](/docs/internals/architecture.html) to support
Multi-Datacenter configurations. However, because requests can be made
across datacenter boundaries, ACL tokens must be valid globally. To avoid
consistency issues, a single datacenter is considered authoritative and stores
the canonical set of tokens.
When a request is made to a server in a non-authoritative datacenter server, it
must be resolved into the appropriate policy. This is done by reading the token
from the authoritative server and caching the result for a configurable
[`acl_ttl`](/docs/agent/options.html#acl_ttl). The implication
of caching is that the cache TTL is an upper bound on the staleness of policy
that is enforced. It is possible to set a zero TTL, but this has adverse
performance impacts, as every request requires refreshing the policy via a
cross-datacenter WAN RPC call.
#### Outages and ACL Replication
The Consul ACL system is designed with flexible rules to accommodate for an outage
of the [`acl_datacenter`](/docs/agent/options.html#acl_datacenter) or networking
issues preventing access to it. In this case, it may be impossible for
servers in non-authoritative datacenters to resolve tokens. Consul provides
a number of configurable [`acl_down_policy`](/docs/agent/options.html#acl_down_policy)
choices to tune behavior. It is possible to deny or permit all actions or to ignore
cache TTLs and enter a fail-safe mode. The default is to ignore cache TTLs
for any previously resolved tokens and to deny any uncached tokens.
<a name="replication"></a>
Consul 0.7 added an ACL Replication capability that can allow non-authoritative
datacenter servers to resolve even uncached tokens. This is enabled by setting an
[`acl_replication_token`](/docs/agent/options.html#acl_replication_token) in the
configuration on the servers in the non-authoritative datacenters. With replication
enabled, the servers will maintain a replica of the authoritative datacenter's full
set of ACLs on the non-authoritative servers. The ACL replication token needs to be
a valid ACL token with management privileges, it can also be the same as the master
ACL token.
Replication occurs with a background process that looks for new ACLs approximately
every 30 seconds. Replicated changes are written at a rate that's throttled to
100 updates/second, so it may take several minutes to perform the initial sync of
a large set of ACLs.
If there's a partition or other outage affecting the authoritative datacenter,
and the [`acl_down_policy`](/docs/agent/options.html#acl_down_policy)
is set to "extend-cache", tokens will be resolved during the outage using the
replicated set of ACLs. An [ACL replication status](/docs/agent/http/acl.html#acl_replication_status)
endpoint is available to monitor the health of the replication process.
Locally-resolved ACLs will be cached using the [`acl_ttl`](/docs/agent/options.html#acl_ttl)
setting of the non-authoritative datacenter, so these entries may persist in the
cache for up to the TTL, even after the authoritative datacenter comes back online.
ACL replication can also be used to migrate ACLs from one datacenter to another
using a process like this:
1. Enable ACL replication in all datacenters to allow continuation of service
during the migration, and to populate the target datacenter. Verify replication
is healthy and caught up to the current ACL index in the target datacenter
using the [ACL replication status](/docs/agent/http/acl.html#acl_replication_status)
endpoint.
2. Turn down the old authoritative datacenter servers.
3. Rolling restart the servers in the target datacenter and change the
`acl_datacenter` configuration to itself. This will automatically turn off
replication and will enable the datacenter to start acting as the authoritative
datacenter, using its replicated ACLs from before.
3. Rolling restart the servers in other datacenters and change their `acl_datacenter`
configuration to the target datacenter.
#### Bootstrapping ACLs
Bootstrapping the ACL system is done by providing an initial [`acl_master_token`
configuration](/docs/agent/options.html#acl_master_token) which will be created
as a "management" type token if it does not exist. The [`acl_master_token`
](/docs/agent/options.html#acl_master_token) is only installed when a server acquires
cluster leadership. If you would like to install or change the
[`acl_master_token`](/docs/agent/options.html#acl_master_token), set the new value for
[`acl_master_token`](/docs/agent/options.html#acl_master_token) in the configuration
for all servers. Once this is done, restart the current leader to force a leader election.
## Rule Specification
A core part of the ACL system is a rule language which is used to describe the policy
that must be enforced.
Key policies are defined by coupling a prefix with a policy. The rules are enforced
using a longest-prefix match policy: Consul picks the most specific policy possible. The
policy is either "read", "write", or "deny". A "write" policy implies "read", and there is no
way to specify write-only. If there is no applicable rule, the
[`acl_default_policy`](/docs/agent/options.html#acl_default_policy) is applied.
Service policies are defined by coupling a service name and a policy. The rules are
enforced using an longest-prefix match policy (this was an exact match in 0.5, but changed
in 0.5.1). The default rule, applied to any service that doesn't have a matching policy,
is provided using the empty string. A service policy is either "read", "write", or "deny".
A "write" policy implies "read", and there is no way to specify write-only. If there is no
applicable rule, the [`acl_default_policy`](/docs/agent/options.html#acl_default_policy) is
applied. The "read" policy in a service ACL rule allows restricting access to
the discovery of that service prefix. More information about service discovery
and ACLs can be found [below](#discovery_acls).
The policy for the "consul" service is always "write" as it is managed internally by Consul.
User event policies are defined by coupling an event name prefix with a policy.
The rules are enforced using a longest-prefix match policy. The default rule,
applied to any user event without a matching policy, is provided by an empty
string. An event policy is one of "read", "write", or "deny". Currently, only
the "write" level is enforced during event firing. Events can always be read.
Prepared query policies control access to create, update, and delete prepared
queries. Service policies are used when executing prepared queries. See
[below](#prepared_query_acls) for more details.
We make use of
the [HashiCorp Configuration Language (HCL)](https://github.com/hashicorp/hcl/)
to specify policy. This language is human readable and interoperable
with JSON making it easy to machine-generate.
Specification in the HCL format looks like:
```javascript
# Default all keys to read-only
key "" {
policy = "read"
}
key "foo/" {
policy = "write"
}
key "foo/private/" {
# Deny access to the dir "foo/private"
policy = "deny"
}
# Default all services to allow registration. Also permits all
# services to be discovered.
service "" {
policy = "write"
}
# Deny registration access to services prefixed "secure-".
# Discovery of the service is still allowed in read mode.
service "secure-" {
policy = "read"
}
# Allow firing any user event by default.
event "" {
policy = "write"
}
# Deny firing events prefixed with "destroy-".
event "destroy-" {
policy = "deny"
}
# Default prepared queries to read-only.
query "" {
policy = "read"
}
# Read-only mode for the encryption keyring by default (list only)
keyring = "read"
# Read-only mode for Consul operator interfaces (list only)
operator = "read"
```
This is equivalent to the following JSON input:
```javascript
{
"key": {
"": {
"policy": "read"
},
"foo/": {
"policy": "write"
},
"foo/private": {
"policy": "deny"
}
},
"service": {
"": {
"policy": "write"
},
"secure-": {
"policy": "read"
}
},
"event": {
"": {
"policy": "write"
},
"destroy-": {
"policy": "deny"
}
},
"query": {
"": {
"policy": "read"
}
},
"keyring": "read",
"operator": "read"
}
```
## Building ACL Policies
#### Blacklist Mode and `consul exec`
If you set [`acl_default_policy`](/docs/agent/options.html#acl_default_policy)
to `deny`, the `anonymous` token won't have permission to read the default
`_rexec` prefix; therefore, Consul agents using the `anonymous` token
won't be able to perform [`consul exec`](/docs/commands/exec.html) actions.
Here's why: the agents need read/write permission to the `_rexec` prefix for
[`consul exec`](/docs/commands/exec.html) to work properly. They use that prefix
as the transport for most data.
You can enable [`consul exec`](/docs/commands/exec.html) from agents that are not
configured with a token by allowing the `anonymous` token to access that prefix.
This can be done by giving this rule to the `anonymous` token:
```javascript
key "_rexec/" {
policy = "write"
}
```
Alternatively, you can, of course, add an explicit
[`acl_token`](/docs/agent/options.html#acl_token) to each agent, giving it access
to that prefix.
#### Blacklist Mode and Service Discovery
If your [`acl_default_policy`](/docs/agent/options.html#acl_default_policy) is
set to `deny`, the `anonymous` token will be unable to read any service
information. This will cause the service discovery mechanisms in the REST API
and the DNS interface to return no results for any service queries. This is
because internally the API's and DNS interface consume the RPC interface, which
will filter results for services the token has no access to.
You can allow all services to be discovered, mimicing the behavior of pre-0.6.0
releases, by configuring this ACL rule for the `anonymous` token:
```
service "" {
policy = "read"
}
```
The above will allow access for reading service information only. This
level of access allows discovering other services in the system, but is not
enough to allow the agent to sync its services and checks into the global
catalog during [anti-entropy](/docs/internals/anti-entropy.html).
The most secure way of handling service registration and discovery is to run
Consul 0.6+ and issue tokens with explicit access for the services or service
prefixes which are expected to run on each agent.
#### Blacklist mode and Events
Similar to the above, if your
[`acl_default_policy`](/docs/agent/options.html#acl_default_policy) is set to
`deny`, the `anonymous` token will have no access to allow firing user events.
This deviates from pre-0.6.0 builds, where user events were completely
unrestricted.
Events have their own first-class expression in the ACL syntax. To restore
access to user events from arbitrary agents, configure an ACL rule like the
following for the `anonymous` token:
```
event "" {
policy = "write"
}
```
As always, the more secure way to handle user events is to explicitly grant
access to each API token based on the events they should be able to fire.
#### Blacklist Mode and Prepared Queries
After Consul 0.6.3, significant changes were made to ACLs for prepared queries,
including a new `query` ACL policy. See [Prepared Query ACLs](#prepared_query_acls) below for more details.
<a name="keyring"></a>
#### Blacklist Mode and Keyring Operations
Consul 0.6 and later supports securing the encryption keyring operations using
ACL's. Encryption is an optional component of the gossip layer. More information
about Consul's keyring operations can be found on the [keyring
command](/docs/commands/keyring.html) documentation page.
If your [`acl_default_policy`](/docs/agent/options.html#acl_default_policy) is
set to `deny`, then the `anonymous` token will not have access to read or write
to the encryption keyring. The keyring policy is yet another first-class citizen
in the ACL syntax. You can configure the anonymous token to have free reign over
the keyring using a policy like the following:
```
keyring = "write"
```
Encryption keyring operations are sensitive and should be properly secured. It
is recommended that instead of configuring a wide-open policy like above, a
per-token policy is applied to maximize security.
<a name="operator"></a>
#### Blacklist Mode and Consul Operator Actions
Consul 0.7 added special Consul operator actions which are protected by a new
`operator` ACL policy. The operator actions cover:
* [Operator HTTP endpoint](/docs/agent/http/operator.html)
* [Operator CLI command](/docs/commands/operator.html)
If your [`acl_default_policy`](/docs/agent/options.html#acl_default_policy) is
set to `deny`, then the `anonymous` token will not have access to Consul operator
actions. Granting `read` access allows reading information for diagnostic purposes
without making any changes to state. Granting `write` access allows reading
information and changing state. Here's an example policy:
```
operator = "write"
```
~> Grant `write` access to operator actions with extreme caution, as improper use
could lead to a Consul outage and even loss of data.
#### Services and Checks with ACLs
Consul allows configuring ACL policies which may control access to service and
check registration. In order to successfully register a service or check with
these types of policies in place, a token with sufficient privileges must be
provided to perform the registration into the global catalog. Consul also
performs periodic [anti-entropy](/docs/internals/anti-entropy.html) syncs, which
may require an ACL token to complete. To accommodate this, Consul provides two
methods of configuring ACL tokens to use for registration events:
1. Using the [acl_token](/docs/agent/options.html#acl_token) configuration
directive. This allows a single token to be configured globally and used
during all service and check registration operations.
2. Providing an ACL token with service and check definitions at
registration time. This allows for greater flexibility and enables the use
of multiple tokens on the same agent. Examples of what this looks like are
available for both [services](/docs/agent/services.html) and
[checks](/docs/agent/checks.html). Tokens may also be passed to the
[HTTP API](/docs/agent/http.html) for operations that require them.
<a name="discovery_acls"></a>
#### Restricting service discovery with ACLs
In Consul 0.6, the ACL system was extended to support restricting read access to
service registrations. This allows tighter access control and limits the ability
of a compromised token to discover other services running in a cluster.
The ACL system permits a user to discover services using the REST API or UI if
the token used during requests has "read"-level access or greater. Consul will
filter out all services which the token has no access to in all API queries,
making it appear as though the restricted services do not exist.
Consul's DNS interface is also affected by restrictions to service
registrations. If the token used by the agent does not have access to a given
service, then the DNS interface will return no records when queried for it.
<a name="prepared_query_acls"></a>
## Prepared Query ACLs
As we've gotten feedback from Consul users, we've evolved how prepared queries
use ACLs. In this section we first cover the current implementation, and then we
follow that with details about what's changed between specific versions of Consul.
#### Managing Prepared Queries
Managing prepared queries includes creating, reading, updating, and deleting
queries. There are a few variations, each of which uses ACLs in one of two
ways: open, protected by unguessable IDs or closed, managed by ACL policies.
These variations are covered here, with examples:
* Static queries with no `Name` defined are not controlled by any ACL policies.
These types of queries are meant to be ephemeral and not shared to untrusted
clients, and they are only reachable if the prepared query ID is known. Since
these IDs are generated using the same random ID scheme as ACL Tokens, it is
infeasible to guess them. When listing all prepared queries, only a management
token will be able to see these types, though clients can read instances for
which they have an ID. An example use for this type is a query built by a
startup script, tied to a session, and written to a configuration file for a
process to use via DNS.
* Static queries with a `Name` defined are controlled by the
[`query`](/docs/internals/acl.html#prepared_query_acls) ACL policy.
Clients are required to have an ACL token with a prefix sufficient to cover
the name they are trying to manage, with a longest prefix match providing a
way to define more specific policies. Clients can list or read queries for
which they have "read" access based on their prefix, and similar they can
update any queries for which they have "write" access. An example use for
this type is a query with a well-known name (eg. `prod-master-customer-db`)
that is used and known by many clients to provide geo-failover behavior for
a database.
* [Template queries](https://www.consul.io/docs/agent/http/query.html#templates)
queries work like static queries with a `Name` defined, except that a catch-all
template with an empty `Name` requires an ACL token that can write to any query
prefix.
#### Executing Pepared Queries
When prepared queries are executed via DNS lookups or HTTP requests, the ACL
checks are run against the service being queried, similar to how ACLs work with
other service lookups. There are several ways the ACL token is selected for this
check:
* If an ACL Token was captured when the prepared query was defined, it will be
used to perform the service lookup. This allows queries to be executed by
clients with lesser or even no ACL Token, so this should be used with care.
* If no ACL Token was captured, then the client's ACL Token will be used to
perform the service lookup.
* If no ACL Token was captured and the client has no ACL Token, then the
anonymous token will be used to perform the service lookup.
In the common case, the ACL Token of the invoker is used
to test the ability to look up a service. If a `Token` was specified when the
prepared query was created, the behavior changes and now the captured
ACL Token set by the definer of the query is used when looking up a service.
Capturing ACL Tokens is analogous to
[PostgreSQLs](http://www.postgresql.org/docs/current/static/sql-createfunction.html)
`SECURITY DEFINER` attribute which can be set on functions, and using the client's ACL
Token is similar to the complementary `SECURITY INVOKER` attribute.
<a name="prepared_query_acl_changes"></a>
#### ACL Implementation Changes for Prepared Queries
Prepared queries were originally introduced in Consul 0.6.0, and ACL behavior remained
unchanged through version 0.6.3, but was then changed to allow better management of the
prepared query namespace.
These differences are outlined in the table below:
<table class="table table-bordered table-striped">
<tr>
<th>Operation</th>
<th>Version <= 0.6.3 </th>
<th>Version > 0.6.3 </th>
</tr>
<tr>
<td>Create static query without `Name`</td>
<td>The ACL Token used to create the prepared query is checked to make sure it can access the service being queried. This token is captured as the `Token` to use when executing the prepared query.</td>
<td>No ACL policies are used as long as no `Name` is defined. No `Token` is captured by default unless specifically supplied by the client when creating the query.</td>
</tr>
<tr>
<td>Create static query with `Name`</td>
<td>The ACL Token used to create the prepared query is checked to make sure it can access the service being queried. This token is captured as the `Token` to use when executing the prepared query.</td>
<td>The client token's `query` ACL policy is used to determine if the client is allowed to register a query for the given `Name`. No `Token` is captured by default unless specifically supplied by the client when creating the query.</td>
</tr>
<tr>
<td>Manage static query without `Name`</td>
<td>The ACL Token used to create the query, or a management token must be supplied in order to perform these operations.</td>
<td>Any client with the ID of the query can perform these operations.</td>
</tr>
<tr>
<td>Manage static query with a `Name`</td>
<td>The ACL token used to create the query, or a management token must be supplied in order to perform these operations.</td>
<td>Similar to create, the client token's `query` ACL policy is used to determine if these operations are allowed.</td>
</tr>
<tr>
<td>List queries</td>
<td>A management token is required to list any queries.</td>
<td>The client token's `query` ACL policy is used to determine which queries they can see. Only management tokens can see prepared queries without `Name`.</td>
</tr>
<tr>
<td>Execute query</td>
<td>Since a `Token` is always captured when a query is created, that is used to check access to the service being queried. Any token supplied by the client is ignored.</td>
<td>The captured token, client's token, or anonymous token is used to filter the results, as described above.</td>
</tr>
</table>
<a name="version_8_acls"></a>
## ACL Changes Coming in Consul 0.8
Consul 0.8 will feature complete ACL coverage for all of Consul. To ease the
transition to the new policies, a beta version of complete ACL support is
available starting in Consul 0.7.2.
Here's a summary of the upcoming changes:
* Agents now check `node` and `service` ACL policies for catalog-related operations
in `/v1/agent` endpoints, such as service and check registration and health check
updates.
* Agents enforce a new `agent` ACL policy for utility operations in `/v1/agent`
endpoints, such as joins and leaves.
* A new `node` ACL policy is enforced throughout Consul, providing a mechanism to
restrict registration and discovery of nodes by name. This also applies to
service discovery, so provides an additional dimension for controlling access to
services.
* A new `session` ACL policy controls the ability to create session objects by node
name.
* Anonymous prepared queries (non-templates without a `Name`) now require a valid
session, which ties their creation to the new `session` ACL policy.
* The existing `event` ACL policy has been applied to the `/v1/event/list` endpoint.
#### New Configuration Options
To enable beta support for complete ACL coverage, set the
[`acl_enforce_version_8`](/docs/agent/options.html#acl_enforce_version_8) configuration
option to `true` on Consul clients and servers.
Two new configuration options are used once complete ACLs are enabled:
* [`acl_agent_master_token`](/docs/agent/options.html#acl_agent_master_token) is used as
a special access token that has `agent` ACL policy `write` privileges on each agent where
it is configured. This token should only be used by operators during outages when Consul
servers aren't available to resolve ACL tokens. Applications should use regular ACL
tokens during normal operation.
* [`acl_agent_token`](/docs/agent/options.html#acl_agent_token) is used internally by
Consul agents to perform operations to the service catalog when registering themselves
or sending network coordinates to the servers.
<br>
<br>
For clients, this token must at least have `node` ACL policy `write` access to the node
name it will register as. For servers, this must have `node` ACL policy `write` access to
all nodes that are expected to join the cluster, as well as `service` ACL policy `write`
access to the `consul` service, which will be registered automatically on its behalf.
Since clients now resolve ACLs locally, the [`acl_down_policy`](/docs/agent/options.html#acl_down_policy)
now applies to Consul clients as well as Consul servers. This will determine what the
client will do in the event that the servers are down.
Consul clients *do not* need to have the [`acl_master_token`](/docs/agent/options.html#acl_agent_master_token)
or the [`acl_datacenter`](/docs/agent/options.html#acl_datacenter) configured. They will
contact the Consul servers to determine if ACLs are enabled. If they detect that ACLs are
not enabled, they will check at most every 2 minutes to see if they have become enabled, and
will start enforcing ACLs automatically.
#### New ACL Policies
The new `agent` ACL policy looks like this:
```
agent "<node name prefix>" {
policy = "<read|write|deny>"
}
```
This affects utility-related agent endpoints, such as `/v1/agent/self` and `/v1/agent/join`.
The new `node` ACL policy looks like this:
```
node "<node name prefix>" {
policy = "<read|write|deny>"
}
````
This affects node registration, node discovery, service discovery, and endpoints like
`/v1/agent/members`.
The new `session` ACL policy looks like this:
```
session "<node name prefix>" {
policy = "<read|write|deny>"
}
```
This affects all the of `/v1/session` endpoints.