diff --git a/website/data/docs-navigation.js b/website/data/docs-navigation.js index 5f1ac13106..a0e53d5d65 100644 --- a/website/data/docs-navigation.js +++ b/website/data/docs-navigation.js @@ -234,6 +234,13 @@ export default [ ], }, 'encryption', + { + category: 'security-models', + content: [ + 'core', + 'nia', + ], + }, ], }, { diff --git a/website/pages/docs/security/index.mdx b/website/pages/docs/security/index.mdx index bf234978e8..2ffa1004d3 100644 --- a/website/pages/docs/security/index.mdx +++ b/website/pages/docs/security/index.mdx @@ -10,154 +10,19 @@ description: >- authentication. --- -# Security Model +## Security Models -Consul relies on both a lightweight gossip mechanism and an RPC system -to provide various features. Both of the systems have different security -mechanisms that stem from their designs. However, the security mechanisms -of Consul have a common goal: to provide -[confidentiality, integrity, and authentication](https://en.wikipedia.org/wiki/Information_security). +Requirements and recommendations for operating a secure Consul deployment may vary drastically depending on your +intended workloads, operating system, and environment. You can find detailed information about the various personas, +recommendations, requirements, and threats [here](/docs/security/security-models). -The [gossip protocol](/docs/internals/gossip) is powered by [Serf](https://www.serf.io/), -which uses a symmetric key, or shared secret, cryptosystem. There are more -details on the security of [Serf here](https://www.serf.io/docs/internals/security.html). -For details on how to enable Serf's gossip encryption in Consul, see the -[encryption doc here](/docs/agent/encryption). +## ACLs -The RPC system supports using end-to-end TLS with optional client authentication. -[TLS](https://en.wikipedia.org/wiki/Transport_Layer_Security) is a widely deployed asymmetric -cryptosystem and is the foundation of security on the Web. +Consul provides an optional [Access Control List (ACL) system](/docs/security/acl) which can be used to control access +to data and APIs. -This means Consul communication is protected against eavesdropping, tampering, -and spoofing. This makes it possible to run Consul over untrusted networks such -as EC2 and other shared hosting providers. +## Encryption -## Secure Configuration - -The Consul threat model is only applicable if Consul is running in a secure -configuration. Consul does not operate in a secure-by-default configuration. If -any of the settings below are not enabled, then parts of this threat model are -going to be invalid. Additional security precautions must also be taken for -items outside of Consul's threat model as noted in sections below. - -- **Consul runs just like any other binary.** Consul runs as a single process - and obeys the same security requirements as any other application on - your system. Consul doesn't interact with the host system to change or - manipulate security values in any way. Take any precautions or remediation - steps that you would normally do for individual processes, based on your - operating system. - Some example remediation steps you could take are outlined below. - - - Run applications, including Consul, as non-root users with appropriate - configurations - - Implement Mandatory Access Control using a kernel security module such as SELinux - - Secure against unprivileged users becoming root - -- **ACLs enabled with default deny.** Consul must be configured to use ACLs with - an allowlist (default deny) approach. This forces all requests to have explicit - anonymous access or provide an ACL token. - -- **Encryption enabled.** TCP and UDP encryption must be enabled and configured - to prevent plaintext communication between Consul agents. At a minimum, - `verify_outgoing` should be enabled to verify server authenticity with each - server having a unique TLS certificate. `verify_server_hostname` is also - required to prevent a compromised agent restarting as a server and being given - access to all secrets. - - `verify_incoming` provides additional agent verification via mutual - authentication, but isn't _strictly_ necessary to enforce the threat model - since requests must also contain a valid ACL token. The subtlety is that - currently `verify_incoming = false` will allow servers to still accept - un-encrypted connections from clients (to allow for gradual TLS rollout). That - alone doesn't violate the threat model, but any misconfigured client that - chooses not to use TLS will violate the model. We recommend setting this to - true. If it is left as false care must be taken to ensure all consul clients - use `verify_outgoing = true` as noted above, but also all external API/UI - access must be via HTTPS with HTTP listeners disabled. - -### Known Insecure Configurations - -In addition to configuring the non-default settings above, Consul has several -non-default options that potentially present additional security risks. - -- **Script checks enabled with network-exposed API.** If a Consul agent (client - or server) exposes its HTTP API to the network beyond localhost, - [`enable_script_checks`](/docs/agent/options#_enable_script_checks) must - be `false` otherwise, even with ACLs configured, script checks present a - remote code execution threat. - [`enable_local_script_checks`](/docs/agent/options#_enable_local_script_checks) - provides a secure alternative if the HTTP API must be exposed and is available - from 1.3.0 on. This feature was also back-ported to patch releases 0.9.4, - 1.1.1, and 1.2.4 [as described here](https://www.hashicorp.com/blog/protecting-consul-from-rce-risk-in-specific-configurations). - -- **Remote exec enabled.** Consul includes a [`consul exec` - feature](/commands/exec) allowing execution of arbitrary commands - across the cluster. This is disabled by default since 0.8.0. We recommend - leaving it disabled. If enabled, extreme care must be taken to ensure correct - ACLs restrict access, for example any management token grants access to - execute arbitrary code on the cluster. - -- **Verify Server Hostname Used Alone.** From version 0.5.1 to 1.4.0 we documented that - `verify_server_hostname` being `true` _implied_ `verify_outgoing` however due - to a bug this was not the case so setting _only_ `verify_server_hostname` - results in plaintext communication between client and server. See - [CVE-2018-19653](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2018-19653) - for more details. This is fixed in 1.4.1. - -## Threat Model - -The following are parts of the Consul threat model: - -- **Consul agent-to-agent communication.** Communication between Consul agents should be secure from eavesdropping. This requires transport encryption to be enabled on the cluster and covers both TCP and UDP traffic. - -- **Consul agent-to-CA communication.** Communication between the Consul server and the configured certificate authority provider for Connect is always encrypted. - -- **Tampering of data in transit.** Any tampering should be detectable and cause Consul to avoid processing the request. - -- **Access to data without authentication or authorization.** All requests must be authenticated and authorized. This requires that ACLs are enabled on the cluster with a default deny mode. - -- **State modification or corruption due to malicious messages.** Ill-formatted messages are discarded and well-formatted messages require authentication and authorization. - -- **Non-server members accessing raw data.** All servers must join the cluster (with proper authentication and authorization) to begin participating in Raft. Raft data is transmitted over TLS. - -- **Denial of Service against a node.** DoS attacks against a node should not compromise the security stance of the software. - -- **Connect-based Service-to-Service communication.** Communications between two Connect-enabled services (natively or by proxy) should be secure from eavesdropping and provide authentication. This is achieved via mutual TLS. - -The following are _not_ part of the Consul threat model for Consul server agents: - -- **Access (read or write) to the Consul data directory.** All Consul servers, including non-leaders, persist the full set of Consul state to this directory. The data includes all KV, service registrations, ACL tokens, Connect CA configuration, and more. Any read or write to this directory allows an attacker to access and tamper with that data. - -- **Access (read or write) to the Consul configuration directory.** Consul configuration can enable or disable the ACL system, modify data directory paths, and more. Any read or write of this directory allows an attacker to reconfigure many aspects of Consul. By disabling the ACL system, this may give an attacker access to all Consul data. - -- **Memory access to a running Consul server agent.** If an attacker is able to inspect the memory state of a running Consul server agent the confidentiality of almost all Consul data may be compromised. If you're using an external Connect CA, the root private key material is never available to the Consul process and can be considered safe. Service Connect TLS certificates should be considered compromised; they are never persisted by server agents but do exist in-memory during at least the duration of a Sign request. - -The following are _not_ part of the Consul threat model for Consul client agents: - -- **Access (read or write) to the Consul data directory.** Consul clients will use the data directory to cache local state. This includes local services, associated ACL tokens, Connect TLS certificates, and more. Read or write access to this directory will allow an attacker to access this data. This data is typically a smaller subset of the full data of the cluster. - -- **Access (read or write) to the Consul configuration directory.** Consul client configuration files contain the address and port information of services, default ACL tokens for the agent, and more. Access to Consul configuration could enable an attacker to change the port of a service to a malicious port, register new services, and more. Further, some service definitions have ACL tokens attached that could be used cluster-wide to impersonate that service. An attacker cannot change cluster-wide configurations such as disabling the ACL system. - -- **Memory access to a running Consul client agent.** The blast radius of this is much smaller than a server agent but the confidentiality of a subset of data can still be compromised. Particularly, any data requested against the agent's API including services, KV, and Connect information may be compromised. If a particular set of data on the server was never requested by the agent, it never enters the agent's memory since replication only exists between servers. An attacker could also potentially extract ACL tokens used for service registration on this agent, since the tokens must be stored in-memory alongside the registered service. - -- **Network access to a local Connect proxy or service.** Communications between a service and a Connect-aware proxy are generally unencrypted and must happen over a trusted network. This is typically a loopback device. This requires that other processes on the same machine are trusted, or more complex isolation mechanisms are used such as network namespaces. This also requires that external processes cannot communicate to the Connect service or proxy (except on the inbound port). Therefore, non-native Connect applications should only bind to non-public addresses. - -- **Improperly Implemented Connect proxy or service.** A Connect proxy or natively integrated service must correctly serve a valid leaf certificate, verify the inbound TLS client certificate, and call the Consul agent-local authorize endpoint. If any of this isn't performed correctly, the proxy or service may allow unauthenticated or unauthorized connections. - -## External Threat Overview - -There are four components that affect the Consul threat model: the server agent, the client agent, the Connect CA, and Consul API clients (including proxies for Connect). - -The server agent participates in leader election and data replication via Raft. All communications with other agents is encrypted. Data is stored at rest unencrypted in the configured data directory. The stored data includes ACL tokens and TLS certificates. If the built-in CA is used with Connect, root certificate private keys are also stored on disk. External CA providers do not store data in this directory. This data directory must be carefully protected to prevent an attacker from impersonating a server or specific ACL user. We plan to introduce further mitigations (including at least partial data encryption) to the data directory over time, but the data directory should always be considered secret. - -For a client agent to join a cluster, it must provide a valid ACL token with node:write capabilities. The join request and all other API requests between the client and server agents communicate via TLS. Clients serve the Consul API and forward all requests to a server over a shared TLS connection. Each request contains an ACL token which is used for both authentication and authorization. Requests that do not provide an ACL token inherit the agent-configurable default ACL token. - -The Connect CA provider is responsible for storing the private key of the root (or intermediate) certificate used to sign and verify connections established via Connect. Consul server agents communicate with the CA provider via an encrypted method. This method is dependent on the CA provider in use. Consul provides a built-in CA which performs all operations locally on the server agent. Consul itself does not store any private key material except for the built-in CA. - -Consul API clients (the agent itself, the built-in UI, external software) must communicate to a Consul agent over TLS and must provide an ACL token per request for authentication and authorization. - -## Network Ports - -For configuring network rules to support Consul, please see [Ports Used](/docs/agent/options#ports) -for a listing of network ports used by Consul and details about which features -they are used for. +The Consul agent supports encrypting all of its network traffic. The exact method of encryption is described on the +[encryption security page](/docs/security/encryption). There are two separate encryption systems, one for gossip +traffic and one for HTTP + RPC. \ No newline at end of file diff --git a/website/pages/docs/security/security-models/core.mdx b/website/pages/docs/security/security-models/core.mdx new file mode 100644 index 0000000000..6448d09598 --- /dev/null +++ b/website/pages/docs/security/security-models/core.mdx @@ -0,0 +1,412 @@ +--- +layout: docs +page_title: Consul Core Security Model +sidebar_title: Core +description: >- + Security model including requirements, recommendations, and threats for the core Consul product. +--- + +## Overview + +Consul enables automation of network configurations, service discovery, and secure network connectivity across any +cloud or runtime. + +Consul uses a lightweight gossip and RPC system which provides various essential features. Both of these systems +provide security mechanisms which should be used to enable confidentiality, integrity and authentication. + +Using defense in depth is crucial for Consul security, and deployment requirements may differ drastically depending on +your use case. Some security features for multi-tenant deployments are offered exclusively in the +[Enterprise](/docs/enterprise) version. This documentation may need to be adapted to your +environment, but the general mechanisms for a secure Consul deployment revolve around: + +- **mTLS** - Mutual authentication of both the TLS server and client x509 certificates prevents internal abuse from + unauthorized access to network components within the cluster. + +- **ACLs** - Enable role-based access controls for authenticated connections by granting capabilities for an individual + human, or machine operator identity via an ACL token to authorize actions within a cluster. Optionally, custom + [authentication methods](/docs/security/acl/auth-methods) can be used to enable trusted external parties to authorize + ACL token creation. + +- **Namespaces** - Read and write operations can be scoped to a logical namespace to restrict + access to Consul components within a multi-tenant environment. + +- **Sentinel Policies** - Sentinel policies enable policy-as-code for granular control over + the built-in key-value store. + +### Personas + +It helps to consider the following types of personas when managing the security requirements of a Consul deployment. +The granularity may change depending on your team's requirements. + +- **System Administrator** - This is someone who has access to the underlying infrastructure to the Consul cluster. + Often they have access to SSH or RDP directly into a server within a cluster through a bastion host. Ultimately they + have read, write and execute permissions for the actual Consul binary. This binary is the same for server and client + agents using different configuration files. These users potentially have sudo, administrative, or some other + super-user access to the underlying compute resource. They have access to all persisted data on disk, or in memory. + This would include ACL tokens, certificates, and other secrets stored on the system. Users like these are essentially + totally trusted, as they have administrative rights to the underlying operating-system with the ability to configure, + start, and stop the agent. + +- **Consul Administrator** - This is someone (probably the same System Administrator) who has access to define the + Consul agent configurations for servers and clients, and/or have a Consul management ACL token. They also have total + rights to all of the parts in the Consul system including the ability to manage all services within a cluster. + +- **Consul Operator** - This is someone who likely has restricted capabilities to use their namespace within a cluster. + +- **Developer** - This is someone who is responsible for creating, and possibly deploying applications connected, or + configured with Consul. In some cases they may have no access, or limited capabilities to view Consul information, + such as through metrics, or logs. + +- **User** - This is the end user, using applications backed by services managed by Consul. In some cases services may + be public facing on the internet such as a web server, typically through a load-balancer, or ingress gateway. This is + someone who should not have any network access to the Consul agent APIs. + +### Secure Configuration + +Consul's security model is applicable only if all parts of the system are running with a secure configuration; **Consul +is not secure-by-default.** Without the following mechanisms enabled in Consul's configuration, it may be possible to +abuse access to a cluster. Like all security considerations, administrators must determine what is appropriate for their +environment and adapt these configurations accordingly. + +#### Requirements + +- **mTLS** - Mutual authentication of both the TLS server and client x509 certificates prevents internal abuse through + unauthorized access to Consul agents within the cluster. + + - [`verify_incoming`](/docs/agent/options#verify_incoming) - By default this is false, and should almost always be set + to true to require TLS verification for incoming client connections. This applies to both server RPC and to the + HTTPS API. + + - [`verify_incoming_https`](/docs/agent/options#verify_incoming_https) - By default this is false, and should be set + to true to require clients to provide a valid TLS certificate when the Consul HTTPS API is enabled. TLS for the API + may be not be necessary if it is exclusively served over a loopback interface such as `localhost`. + + - [`verifing_incoming_rpc`](/docs/agent/options#verify_incoming_rpc) - By default this is false, and should almost + always be set to true to require clients to provide a valid TLS certificate for Consul agent RPCs. + + - [`verify_outgoing`](/docs/agent/options#verify_outgoing) - By default this is false, and should be set to true to + require TLS for outgoing connections from server or client agents. Servers that specify `verify_outgoing = true` + will always talk to other servers over TLS, but they still accept non-TLS connections to allow for a transition of + all clients to TLS. Currently the only way to enforce that no client can communicate with a server unencrypted is + to also enable `verify_incoming` which requires client certificates too. + + - [`enable_agent_tls_for_checks`](/docs/agent/options#enable_agent_tls_for_checks) - By default this is false, and + should almost always be set to true to require mTLS to set up the client for HTTP or gRPC health checks. This was + added in Consul 1.0.1. + + - [`verify_server_hostname`](/docs/agent/options#verify_server_hostname) - By default this is false, and should be + set to true to require that the TLS certificate presented by the servers matches + `server..` hostname for outgoing TLS connections. The default configuration does not verify the + hostname of the certificate, only that it is signed by a trusted CA. This setting is critical to prevent a + compromised client agent from being restarted as a server and having all cluster state including all ACL tokens and + Connect CA root keys replicated to it. This setting was introduced in 0.5.1. From version 0.5.1 to 1.4.0 we + documented that `verify_server_hostname` being true implied verify_outgoing however due to a bug this was not the + case so setting only `verify_server_hostname` results in plaintext communication between client and server. See + [CVE-2018-19653](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2018-19653) for more details. This is fixed + in 1.4.1. + + - [`auto_encrypt`](/docs/agent/options#auto_encrypt) - Enables automated TLS certificate distribution for client + agent RPC communication using the Connect CA. Using this configuration a [`ca_file`](/docs/agent/options#ca_file) + and ACL token would still need to be distributed to client agents. + + - [`allow_tls`](/docs/agent/options#allow_tls) - By default this is false, and should be set to true on server + agents to allow certificates to be automatically generated and distributed from the Connect CA to client agents. + + - [`tls`](/docs/agent/options#tls) - By default this false, and should be set to true on client agents to + automatically request a client TLS certificate from the server's Connect CA. + + **Example Server Agent TLS Configuration** + + ```hcl + verify_incoming = true + verify_outgoing = true + verify_server_hostname = true + + ca_file = "consul-agent-ca.pem" + cert_file = "dc1-server-consul-0.pem" + key_file = "dc1-server-consul-0-key.pem" + + auto_encrypt { + allow_tls = true + } + ``` + + **Example Client Agent TLS Configuration** + + ```hcl + verify_incoming = false + verify_outgoing = true + verify_server_hostname = true + + ca_file = "consul-agent-ca.pem" + + auto_encrypt { + tls = true + } + ``` + + -> The client agent TLS configuration from above sets [`verify_incoming`](/docs/agent/options#verify_incoming) to + false which assumes all incoming traffic is restricted to `localhost`. The primary benefit for this configuration + would be to avoid provisioning client TLS certificates (in addition to ACL tokens) for all tools or applications + using the local Consul agent. In this case ACLs should be enabled to provide authorization and only ACL tokens would + need to be distributed. + +- **ACLs** - The access control list (ACL) system provides a security mechanism for Consul administrators to grant + capabilities tied to an individual human, or machine operator identity. To ultimately secure the ACL system, + administrators should configure the [`default_policy`](/docs/agent/options#acl_default_policy) to "deny". + + The [system](/docs/acl/acl-system) is comprised of five major components: + + - **🗝 Token** - API key associated with policies, roles, or service identities. + + - **📜 Policy** - Set of rules to grant or deny access to various Consul resources. + + - **🎭 Role** - Grouping of policies, and service identities. + + - **👤 Service or Node Identity** - Synthetic policy granting a predefined set of permissions typical for services + deployed within Consul. + + - **🏷 Namespace** - a named, logical scoping of Consul Enterprise resources, typically to + enable multi-tenant environments. Consul OSS clusters always operate within the “default” namespace. + +- **Gossip Encryption** - A shared, base64-encoded 32-byte symmetric key is required to [encrypt Serf gossip + communication](https://learn.hashicorp.com/tutorials/consul/gossip-encryption-secure) within a cluster using + AES GCM. The key size determines which AES encryption types to use; 16, 24, or 32 bytes to select AES-128, AES-192, + or AES-256 respectively. 32-byte keys are ultimately preferable and is the default size generated by the + [`keygen`](/commands/keygen) command. This key should be + [regularly rotated](https://support.hashicorp.com/hc/en-us/articles/360044051754-Consul-Gossip-Key-Rotation) using + the builtin [keyring management](/commands/keyring) features of Consul. + + Two optional gossip encryption options enable Consul servers without gossip encryption to safely upgrade. After + upgrading, the verification options should be enabled, or removed to set them to their default state: + + - [`encrypt_verify_incoming`](/docs/agent/options#encrypt_verify_incoming) - By default this is true to enforce + encryption on *incoming* gossip communications. + + - [`encrypt_verify_outgoing`](/docs/agent/options#encrypt_verify_outgoing) - By default this is true to enforce + encryption on *outgoing* gossip communications. + +- **Namespaces** - Read and write operations should be scoped to logical namespaces to + restrict access to Consul components within a multi-tenant environment. Furthermore, this feature can be used to + enable a self-service approach to Consul ACL administration for teams within a scoped namespace. + +- **Sentinel Policies** - Sentinel policies allow for granular control over the builtin + key-value store. + +- **Ensure Script Checks are Disabled** - Consul’s agent optionally has an HTTP API, which can be exposed beyond + `localhost`. If this is the case, `enable_script_checks` must be false otherwise, even with ACLs configured, script + checks present a remote code execution threat. `enable_local_script_checks` provides a secure alternative if the + HTTP API must be exposed and is available from 1.3.0 on. This feature was also back-ported to patch releases 0.9.4, + 1.1.1, and 1.2.4 as described here. This is not enabled by default. + +- **Ensure Remote Execution is Disabled** - Consul includes a consul exec feature allowing execution of arbitrary + commands across the cluster. This is disabled by default since 0.8.0. We recommend leaving it disabled. If enabled, + extreme care must be taken to ensure correct ACLs restrict access to execute arbitrary code on the cluster. + +#### Recommendations + +- **Rotate Credentials** - Using short-lived credentials and rotating them frequently is highly recommended for + production environments to limit the blast radius from potentially compromised secrets, and enabling basic auditing. + + - **ACL Tokens** - Consul API’s require an ACL token to authorize actions within a cluster. + + - **X.509 Certificates** - Rotate certificates used by the Consul agent; e.g. integrate with Vault's PKI secret engine + to automatically generate and renew dynamic, unique X.509 certificates for each Consul node with a short TTL. Client + certificates can be automatically rotated by Consul when using `auto_encrypt` such that only server certificates + would be managed by Vault. + + - **Gossip Keys** - Rotating the encryption keys used by the internal gossip protocol for Consul agents can be + regularly rotated using the builtin keyring management features. + +- **Running without Root** - Consul agents can be run as unprivileged users that only require access to the + data directory. + +- **Linux Security Modules** - Use of security modules that can be directly integrated into operating systems such as + AppArmor, SElinux, and Seccomp on Consul agent hosts. + +- **Customize TLS Settings** - TLS settings such as the [available cipher suites](/docs/agent/options#tls_cipher_suites), + should be tuned to fit the needs of your environment. + + - [`tls_min_version`](/docs/agent/options#tls_min_version) - Used to specify the minimum TLS version to use. + + - [`tls_cipher_suites`](/docs/agent/options#tls_cipher_suites) - Used to specify which TLS cipher suites are allowed. + + - [`tls_prefer_server_cipher_suites`](/docs/agent/options#tls_prefer_server_cipher_suites) - Used to specify which TLS + cipher suites are preferred on the server side. + +- **Customize HTTP Response Headers** - Additional security headers, such as + [`X-XSS-Protection`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-XSS-Protection), can be + [configured](https://www.consul.io/docs/agent/options#response_headers) for HTTP API responses. + + ```hcl + http_config { + reponse_headers { + "X-Frame-Options" = "DENY" + } + } + ``` + +- **Customize Default Limits** - Consul has a number of builtin features with default connection limits that should be + tuned to fit your environment. + + - [`http_max_conns_per_client`](/docs/agent/options#http_max_conns_per_client) - Used to limit concurrent access from + a single client to the HTTP(S) endpoint on Consul agents. + + - [`https_handshake_timeout`](/docs/agent/options#https_handshake_timeout) - Used to timeout TLS connection for the + HTTP(S) endpoint for Consul agents. + + - [`rpc_handshake_timeout`](/docs/agent/options#rpc_handshake_timeout) - Used to timeout TLS connections for the RPC + endpoint for Consul agents. + + - [`rpc_max_conns_per_client`](/docs/agent/options#rpc_max_conns_per_client) - Used to limit concurrent access from a + single client to the RPC endpoint on Consul agents. + + - [`rpc_rate`](/docs/agent/options#rpc_rate) - Disabled by default, this is used to limit (requests/second) for client + agents making RPC calls to server agents. + + - [`rpc_max_burst`](/docs/agent/options#rpc_max_burst) - Used as the token bucket size for client agents making RPC + calls to server agents. + + - [`kv_max_value_size`](/docs/agent/options#kv_max_value_size) - Used to configure the max number of bytes in a + key-value API request. + + - [`txn_max_req_len`](/docs/agent/options#txn_max_req_len) - Used to configure the max number of bytes in a + transaction API request. + +- **Secure UI Access** - Access to Consul’s builtin UI can be secured in various ways: + + - **mTLS** - Enabling the HTTPS with mutual TLS authentication is recommended, but requires extra tooling to terminate + the mTLS connection, preferably on an operator's local machine using a proxy script. + + - **TLS** - Enabling the HTTPS is recommended where mTLS may not be required for UI access, such as when ACLs are + configured with a default deny. + + - **ACL** - ACLs with a default deny policy enables safer UI access by preventing unauthorized access to sensitive + components within the cluster. + + - **Restrict HTTP Writes** - Using the allow_write_http_from configuration option enables agent endpoints restricting + write capabilities to a list of CIDRs. + + **Example Agent Configuration** + + ```hcl + http_config { + allow_write_http_from = ["127.0.0.0/8"] + } + ``` + +### Threat Model + +The following are parts of the core Consul threat model: + +- **Consul agent-to-agent communication** - Communication between Consul agents should be secure from eavesdropping. + This requires transport encryption to be enabled on the cluster and covers both TCP and UDP traffic. + +- **Consul agent-to-CA communication** - Communication between the Consul server and the configured certificate + authority provider for Connect is always encrypted. + +- **Tampering of data in transit** - Any tampering should be detectable and cause Consul to avoid processing + the request. + +- **Access to data without authentication or authorization** - All requests must be authenticated and authorized. This +requires that ACLs are enabled on the cluster with a default deny mode. + +- **State modification or corruption due to malicious messages** - Ill-formatted messages are discarded and + well-formatted messages require authentication and authorization. + +- **Non-server members accessing raw data** - All servers must join the cluster (with proper authentication and + authorization) to begin participating in Raft. Raft data is transmitted over TLS. + +- **Denial of Service against a node** - DoS attacks against a node should not compromise the security stance of + the software. + +- **Connect-based Service-to-Service communication** - Communications between two Connect-enabled services (natively or + by proxy) should be secure from eavesdropping and provide authentication. This is achieved via mutual TLS. + +The following are not part of the threat model for server agents: + +- **Access (read or write) to the Consul data directory** - All Consul servers, including non-leaders, persist the full + set of Consul state to this directory. The data includes all KV, service registrations, ACL tokens, Connect CA + configuration, and more. Any read or write to this directory allows an attacker to access and tamper with that data. + +- **Access (read or write) to the Consul configuration directory** - Consul configuration can enable or disable the ACL + system, modify data directory paths, and more. Any read or write of this directory allows an attacker to reconfigure + many aspects of Consul. By disabling the ACL system, this may give an attacker access to all Consul data. + +- **Memory access to a running Consul server agent** - If an attacker is able to inspect the memory state of a running + Consul server agent the confidentiality of almost all Consul data may be compromised. If you're using an external + Connect CA, the root private key material is never available to the Consul process and can be considered safe. Service + Connect TLS certificates should be considered compromised; they are never persisted by server agents but do exist + in-memory during at least the duration of a Sign request. + +The following are not part of the threat model for client agents: + +- **Access (read or write) to the Consul data directory** - Consul clients will use the data directory to cache local + state. This includes local services, associated ACL tokens, Connect TLS certificates, and more. Read or write access + to this directory will allow an attacker to access this data. This data is typically a smaller subset of the full data + of the cluster. + +- **Access (read or write) to the Consul configuration directory** - Consul client configuration files contain the + address and port information of services, default ACL tokens for the agent, and more. Access to Consul configuration + could enable an attacker to change the port of a service to a malicious port, register new services, and more. + Further, some service definitions have ACL tokens attached that could be used cluster-wide to impersonate that + service. An attacker cannot change cluster-wide configurations such as disabling the ACL system. + +- **Memory access to a running Consul client agent** - The blast radius of this is much smaller than a server agent but + the confidentiality of a subset of data can still be compromised. Particularly, any data requested against the agent's + API including services, KV, and Connect information may be compromised. If a particular set of data on the server was + never requested by the agent, it never enters the agent's memory since replication only exists between servers. An + attacker could also potentially extract ACL tokens used for service registration on this agent, since the tokens must + be stored in-memory alongside the registered service. + +- **Network access to a local Connect proxy or service** - Communications between a service and a Connect-aware proxy + are generally unencrypted and must happen over a trusted network. This is typically a loopback device. This requires + that other processes on the same machine are trusted, or more complex isolation mechanisms are used such as network + namespaces. This also requires that external processes cannot communicate to the Connect service or proxy (except on + the inbound port). Therefore, non-native Connect applications should only bind to non-public addresses. + +- **Improperly Implemented Connect proxy or service** - A Connect proxy or natively integrated service must correctly + serve a valid leaf certificate, verify the inbound TLS client certificate, and call the Consul agent-local authorized + endpoint. If any of this isn't performed correctly, the proxy or service may allow unauthenticated or unauthorized + connections. + +#### Internal Threats + +- **Operator** - A malicious internal Consul operator with a valid mTLS certificate and ACL token may still be a threat + to your cluster in certain situations, especially in multi-team deployments. They may accidentally or intentionally + abuse access to Consul components which can help be protected against using Namespace, and Sentinel policies. + +- **Application** - A malicious internal application, suchs as a compromised third-party dependency with access to a + Consul agent, along with the TLS certificate or ACL token used by the local agent, could effectively do anything the + token permits. Consider enabling HTTPS for the local Consul agent API, enforcing full mutual TLS verification, + segmenting services using namespaces, as well as configuring OS users, groups, and file permissions to build a defense-in-depth approach. + +- **RPC** - Malicious actors with access to a Consul agent RPC endpoint may be able to impersonate Consul server agents + if mTLS is not properly configured to verify the client TLS certificate identity. Consul should also have ACLs enabled + with a default policy explicitly set to deny to require authorization. + +- **HTTP** - Malicious actors with access to a Consul agent HTTP(S) endpoint may be able to impersonate the agent’s + configured identity, and extract information from Consul when ACLs are disabled. + +- **DNS** - Malicious actors with access to a Consul agent DNS endpoint may be able to extract service catalog + information. Gossip - Malicious actors with access to a Consul agent Serf gossip endpoint may be able to impersonate + agents within a datacenter. Gossip encryption should be enabled, with a regularly rotated gossip key. + +- **Proxy (xDS)** - Malicious actors with access to a Consul agent xDS endpoint may be able to extract Envoy service + information. When ACLs and HTTPS are enabled, the gRPC endpoint serving up the xDS service requires (m)TLS and a + valid ACL token. + +#### External Threats + +- **Agents** - External access to the Consul agent’s various network endpoints should be considered including the + gossip, HTTP, RPC, and gRPC ports. Furthermore, access through other services like SSH or `exec` functionality in + orchestration systems such as Nomad and Kubernetes may expose unencrypted information persisted to disk including + TLS certificates or ACL tokens. Access to the Consul agent directory is explicitly outside the scope of Consul’s + threat model and should only be exposed to authenticated and authorized users. + +- **Gateways** - Consul supports a variety of [gateways](/docs/connect/gateways) to allow traffic in-and-out of the + service mesh to support a variety of workloads. When using an internet-exposed gateway, you should be sure to harden + your Consul agent and host configurations. In most configurations, ACLS, gossip encryption, and mTLS should be + enforced. If an [escape hatch override](https://www.consul.io/docs/connect/proxies/envoy#escape-hatch-overrides) is + required, the proxy configuration should be audited to ensure security configurations remain intact, and do not + violate Consul’s security model. diff --git a/website/pages/docs/security/security-models/index.mdx b/website/pages/docs/security/security-models/index.mdx new file mode 100644 index 0000000000..1feb447894 --- /dev/null +++ b/website/pages/docs/security/security-models/index.mdx @@ -0,0 +1,27 @@ +--- +layout: docs +page_title: Security Models +sidebar_title: Security Models +description: >- + Overview and links to various Consul security models. +--- + +## Overview + +Requirements and recommendations for operating a secure Consul deployment may vary drastically depending on your +intended workloads, operating system, and environment. Consul is not secure by default, but can be configured to satisfy +the security requirements for a wide-range of use cases from local developer environments without any configuration to +container orchestrators in-production with ACL authorization, and mTLS authentication. + +### Core + +The core Consul product provides several options for enabling encryption, authentication, and authorization +controls for a cluster. You can read more about the various personas, recommendations, requirements, and threats +[here](/docs/security/security-models/core). + +### NIA + +[Network Infrastructure Automation](/docs/nia) (NIA) enables dynamic updates to network infrastructure devices triggered +by service changes. Both the core Consul product's configuration and the configuration for the `consul-terraform-sync` +daemon used by NIA can affect the security of your deploymnet. You can read more about the various personas, +recommendations, requirements, and threats [here](/docs/security/security-models/nia). diff --git a/website/pages/docs/security/security-models/nia.mdx b/website/pages/docs/security/security-models/nia.mdx new file mode 100644 index 0000000000..f085a4f921 --- /dev/null +++ b/website/pages/docs/security/security-models/nia.mdx @@ -0,0 +1,174 @@ +--- +layout: docs +page_title: Consul NIA Security Model +sidebar_title: Network Infrastructure Automation Tech Preview +description: >- + Security model including requirements, recommendations, and threats for Consul Network Infrastructure Automation (NIA). +--- + +## Overview + +Consul Network Infrastructure Automation (NIA) enables declarative workflows to handle Day-2 network security +infrastructure tasks for network, security, and operations teams. NIA uses [Terraform](https://www.terraform.io/) +to propagate Consul catalog changes, and relevant configuration to network appliances, or network APIs. + +### Personas + +When thinking about Consul NIA, it helps to consider the following types of base personas when managing the security +requirements for the cluster deployment. + +- **System Administrator** - This is someone who has access to the underlying infrastructure to the + Consul NIA daemon, and possibly the core Consul service. Often she has access to SSH directly + into a server within a cluster through a bastion host. Ultimately they have read, write and + execute permissions for the actual NIA daemon binary. These users potentially have sudo, + administrative, or some other super-user access to the underlying compute resource. Users like + these are essentially totally trusted by Consul NIA as they have administrative rights to the + system and can start or stop the daemon. + +- **Consul NIA Operator** - This is someone (probably the same System Administrator) who has access + to define the Consul NIA daemon configuration, and possibly a Consul ACL token, and other secrets to + run the daemon against various network infrastructure APIs. They also have total rights to all of + the parts in the Consul NIA system including the ability to configure, start, and stop the daemon. + +- **Developer** - This is someone who is responsible for creating, and possibly deploying applications + connected, or configured with Consul. In some cases they may have no access, or limited capabilities + to view Consul information, such as through metrics, or logs. + +- **User** - The end-user using the applications and other services managed by the NIA daemon, and should + have no knowledge or access to the daemon’s API endpoints, ACL tokens, certificates, or any other + piece of the system. + +### Secure Configuration + +Consul NIA’s security model is applicable only if all parts of the system are running with a secure +configuration; the daemon is not secure-by-default. Without the following mechanisms enabled in the +daemon’s configuration, it may be possible to abuse access to the daemon. Like all security +considerations, one must determine what concerns are appropriate for their environment, and adapt these +security concerns accordingly. + +#### Requirements + +- **Protect Configuration Files & Directories** - A dedicated NIA daemon user and group with limited + permissions should be created for production, along with directory and file permissions appropriately + scoped for your operating environment. + + Example commands to illustrate creating a dedicated `consul-nia` system user, along with the supporting + directories, configuration file, and securing those permissions using + [`chown`](https://en.wikipedia.org/wiki/Chown) and [`chmod`](https://en.wikipedia.org/wiki/Chmod): + + ```shell-session + $ useradd --system --shell /bin/false consul-nia + $ mkdir -p /consul-nia/data + $ mkdir -p /consul-nia/config + $ echo "{ ... }" > /consul-nia/config/file.hcl + $ chown --recursive consul-nia:consul-nia /consul-nia + $ chmod -R 0750 consul-nia/ + ``` + +- **Protect Consul KV Path or Namespaces** - Note the NIA daemon can monitor Consul services in other Namespaces. + This can be limited based on the ACL token used for the NIA daemon. + +- **Use Consul ACLs** - The Access Control List (ACL) system within Consul can be used to restrict access to + only the required parts of Consul for the NIA daemon to operate. + - **Read + Write** permission for Consul KV to the specified path, and namespace. + - **Read** permission for Consul Catalog for all of the selected services to be monitored, and their namespaces. + - **Read + Write** permission to update health checks, when using NIA health monitoring. + +#### Recommendations + +- **Use Dedicated Host** - The NIA daemon will potentially have access to critical secrets for your environment’s + network infrastructure. Using a hardened, dedicated host, for supporting these sensitive operations is highly. + Workload orchestrators, such as [HashiCorp Nomad](https://www.nomadproject.io/), also provide benefits of ensuring + uptime and isolation. + +- **Run without Root** - The NIA daemon does not require root or other administrative privileges to operate. + +- **Protect NIA Daemon API Endpoint** - Any network endpoints provided by, or exposed to the NIA Daemon should be + protected using Consul Connect and appropriate firewall rules. + +- **Use a centralized logging solution** - Export log entries within [syslog](https://en.wikipedia.org/wiki/Syslog) + generated from the NIA daemon to a centralized logging solution. + +- **Audit used Terraform providers** - [Terraform providers](https://www.terraform.io/docs/providers/index.html) that + are configured with the NIA daemon should be audited to ensure you’re only using providers from sources that + you trust. + +### Threat Model + +The following are the parts of the NIA threat model: + +- **Consul agent communication** - In order to monitor the Consul Catalog for changes, the NIA daemon interacts with + Consul’s HTTP API on a local or remote server agent. This communication requires TLS transport encryption, preferably + using mTLS for mutual authentication. + +- **NIA Terraform communication** - Network connectivity to downstream infrastructure APIs managed by the NIA daemon’s + Terraform runs will need to be properly configured for secure access. + +- **Tampering of data in transit** - Any tampering should be detectable and cause the daemon to avoid processing the + request. + +- **Access to data without authentication or authorization** - Requests to the Consul agent should be authenticated and + authorized using (m)TLS and ACLs respectively. ACLs should be configured with the minimal permissions required for + your environment. + +- **Denial-of-Service** - DoS attacks against the NIA Daemon should not compromise the security of Consul, or Terraform, + but may impact any networking components relying on updates from the daemon to properly handle traffic within the + network. Access to the daemon should be prevented using firewall rules. + +The following are not a part of the threat model, as the NIA Daemon expects a secure configuration, while always +providing the default options for testing in local environments which cannot be automatically configured to be both +secure, and easily usable. However, these are valid concerns for Administrators and Operators to evaluate when hardening +a production deployment: + +- **Access (read or write) to the Consul NIA Configuration Files or Directory** - Necessary configuration for the daemon + process can be loaded from a single file or a directory of files. These configurations may contain secrets and can + enable/disable insecure features, or Terraform providers. + +- **Access (read or write) to the Consul NIA Consul KV Path** - Access to the daemon’s Consul KV path may leak sensitive + information such as usernames, passwords, certificates, and tokens used by Terraform to provision infrastructure. + +- **Memory Access to a Running Consul NIA Daemon Process** - Direct access to the memory of running the daemon process + allows an attacker to extract sensitive information. + +- **Memory Access to a Running Terraform Process** - Direct access to the memory of running the Terraform process + managed by the daemon process allows an attacker to extract sensitive information. + +- **Access to the Terraform Binary** - Direct access to the Terraform binary used by the NIA daemon can allow an + attacker to extract sensitive information. + +- **Access to the Consul NIA Daemon Binary** - Direct access to the system binary used to start the NIA daemon can allow + an attacker to extract sensitive information. + +#### Internal Threats + +- **NIA Operator** - Someone with access to the NIA Host, and it’s related binaries or configuration files may be a + threat to your deployment, especially considering multi-team deployments. They may accidentally or intentionally use a + malicious Terraform provider, or extract various secrets to cause harm to the network. Access to the NIA host should + be guarded. + +- **Consul Operator** - Someone with access to the backend Consul cluster, similar to the NIA Operator, which can + perform actions that may trigger Terraform runs. They may also have access to the namespace and KV path of the NIA + daemon, which could give unintended access to Terraform’s state file, which contains sensitive information. ACL + permissions for Consul should be carefully audited to ensure that no policies may be leaking the state file containing + sensitive information to other Consul operators unintentionally within the cluster. + +- **System-bound Attackers** - Multi-tenant environments, especially container orchestrators, can introduce a number of + security concerns. These may include shared secrets, host volume access, and other sources of potential pivoting, or + privilege escalation from attackers with operating system-level access, or side-car container access, through various + means. Extra steps to configuring OS, cluster, service, user, directory, and file permissions are essential steps for + implementing defense-in-depth within a production environment. + +#### External Threats + +- **Terraform Providers and Modules** - Potentially malicious providers or modules, or any malicious dependencies part + of the Terraform ecosystem could cause harm to the network, and may have access to secrets in order to make necessary + network changes. Terraform provider configuration should be audited, pinned to a version, and audited for potential + typo-squatting issues from the Terraform Registry. + +- **Network-bound Attackers** - Whenever a service is exposed to the open internet, which may be the case, you really + need to consider external network attackers which may seek-out hidden, unauthenticated, or otherwise vulnerable + endpoints. This can lead to larger security concerns when able to pivot to internal resources from an external one. + +- **Leaking Secrets** - TLS certificates and tokens used by the Consul NIA daemon can enable external attackers to + access Consul, or Terraform resources. These secrets shouldn’t be hardcoded into configs uploaded to public + places like GitHub. \ No newline at end of file