diff --git a/website/content/api-docs/agent/check.mdx b/website/content/api-docs/agent/check.mdx index eafbb17c42..785fbce8b3 100644 --- a/website/content/api-docs/agent/check.mdx +++ b/website/content/api-docs/agent/check.mdx @@ -6,7 +6,10 @@ description: The /agent/check endpoints interact with checks on the local agent # Check - Agent HTTP API -The `/agent/check` endpoints interact with checks on the local agent in Consul. +Consul's health check capabilities are described in the +[health checks overview](/docs/discovery/checks). +The `/agent/check` endpoints interact with health checks +managed by the local agent in Consul. These should not be confused with checks in the catalog. ## List Checks @@ -418,6 +421,10 @@ $ curl \ This endpoint is used with a TTL type check to set the status of the check to `critical` and to reset the TTL clock. +If you want to manually mark a service as unhealthy, +use [maintenance mode](/api-docs/agent#enable-maintenance-mode) +instead of defining a TTL health check and using this endpoint. + | Method | Path | Produces | | ------ | ----------------------------- | ------------------ | | `PUT` | `/agent/check/fail/:check_id` | `application/json` | @@ -456,6 +463,10 @@ $ curl \ This endpoint is used with a TTL type check to set the status of the check and to reset the TTL clock. +If you want to manually mark a service as unhealthy, +use [maintenance mode](/api-docs/agent#enable-maintenance-mode) +instead of defining a TTL health check and using this endpoint. + | Method | Path | Produces | | ------ | ------------------------------- | ------------------ | | `PUT` | `/agent/check/update/:check_id` | `application/json` | diff --git a/website/content/api-docs/health.mdx b/website/content/api-docs/health.mdx index 898c8ffe41..cad74bbad2 100644 --- a/website/content/api-docs/health.mdx +++ b/website/content/api-docs/health.mdx @@ -14,6 +14,9 @@ optional health checking mechanisms. Additionally, some of the query results from the health endpoints are filtered while the catalog endpoints provide the raw entries. +To modify health check registration or information, +use the [`/agent/check`](/api-docs/agent/check) endpoints. + ## List Checks for Node This endpoint returns the checks specific to the node provided on the path. diff --git a/website/content/docs/discovery/checks.mdx b/website/content/docs/discovery/checks.mdx index 5a21495793..1b4c4faf4b 100644 --- a/website/content/docs/discovery/checks.mdx +++ b/website/content/docs/discovery/checks.mdx @@ -13,144 +13,72 @@ description: >- One of the primary roles of the agent is management of system-level and application-level health checks. A health check is considered to be application-level if it is associated with a service. If not associated with a service, the check monitors the health of the entire node. -Review the [health checks tutorial](https://learn.hashicorp.com/tutorials/consul/service-registration-health-checks) to get a more complete example on how to leverage health check capabilities in Consul. -A check is defined in a configuration file or added at runtime over the HTTP interface. Checks -created via the HTTP interface persist with that node. +Review the [service health checks tutorial](https://learn.hashicorp.com/tutorials/consul/service-registration-health-checks) +to get a more complete example on how to leverage health check capabilities in Consul. -There are several different kinds of checks: +## Registering a health check -- Script + Interval - These checks depend on invoking an external application - that performs the health check, exits with an appropriate exit code, and potentially - generates some output. A script is paired with an invocation interval (e.g. - every 30 seconds). This is similar to the Nagios plugin system. The output of - a script check is limited to 4KB. Output larger than this will be truncated. - By default, Script checks will be configured with a timeout equal to 30 seconds. - It is possible to configure a custom Script check timeout value by specifying the - `timeout` field in the check definition. When the timeout is reached on Windows, - Consul will wait for any child processes spawned by the script to finish. For any - other system, Consul will attempt to force-kill the script and any child processes - it has spawned once the timeout has passed. - In Consul 0.9.0 and later, script checks are not enabled by default. To use them you - can either use : +There are three ways to register a service with health checks: - - [`enable_local_script_checks`](/docs/agent/config/cli-flags#_enable_local_script_checks): - enable script checks defined in local config files. Script checks defined via the HTTP - API will not be allowed. - - [`enable_script_checks`](/docs/agent/config/cli-flags#_enable_script_checks): enable - script checks regardless of how they are defined. +1. Start or reload a Consul agent with a service definition file in the + [agent's configuration directory](/docs/agent#configuring-consul-agents). +1. Call the + [`/agent/service/register`](/api-docs/agent/service#register-service) + HTTP API endpoint to register the service. +1. Use the + [`consul services register`](/commands/services/register) + CLI command to register the service. - ~> **Security Warning:** Enabling script checks in some configurations may - introduce a remote execution vulnerability which is known to be targeted by - malware. We strongly recommend `enable_local_script_checks` instead. See [this - blog post](https://www.hashicorp.com/blog/protecting-consul-from-rce-risk-in-specific-configurations) - for more details. +When a service is registered using the HTTP API endpoint or CLI command, +the checks persist in the Consul data folder across Consul agent restarts. -- `HTTP + Interval` - These checks make an HTTP `GET` request to the specified URL, - waiting the specified `interval` amount of time between requests (eg. 30 seconds). - The status of the service depends on the HTTP response code: any `2xx` code is - considered passing, a `429 Too ManyRequests` is a warning, and anything else is - a failure. This type of check - should be preferred over a script that uses `curl` or another external process - to check a simple HTTP operation. By default, HTTP checks are `GET` requests - unless the `method` field specifies a different method. Additional header - fields can be set through the `header` field which is a map of lists of - strings, e.g. `{"x-foo": ["bar", "baz"]}`. By default, HTTP checks will be - configured with a request timeout equal to 10 seconds. +## Types of checks - It is possible to configure a custom HTTP check timeout value by - specifying the `timeout` field in the check definition. The output of the - check is limited to roughly 4KB. Responses larger than this will be truncated. - HTTP checks also support TLS. By default, a valid TLS certificate is expected. - Certificate verification can be turned off by setting the `tls_skip_verify` - field to `true` in the check definition. When using TLS, the SNI will be set - automatically from the URL if it uses a hostname (as opposed to an IP address); - the value can be overridden by setting `tls_server_name`. +This section describes the available types of health checks you can use to +automatically monitor the health of a service instance or node. - Consul follows HTTP redirects by default. Set the `disable_redirects` field to - `true` to disable redirects. +-> **To manually mark a service unhealthy:** Use the maintenance mode + [CLI command](/commands/maint) or + [HTTP API endpoint](/api-docs/agent#enable-maintenance-mode) + to temporarily remove one or all service instances on a node + from service discovery DNS and HTTP API query results. -- `TCP + Interval` - These checks make a TCP connection attempt to the specified - IP/hostname and port, waiting `interval` amount of time between attempts - (e.g. 30 seconds). If no hostname - is specified, it defaults to "localhost". The status of the service depends on - whether the connection attempt is successful (ie - the port is currently - accepting connections). If the connection is accepted, the status is - `success`, otherwise the status is `critical`. In the case of a hostname that - resolves to both IPv4 and IPv6 addresses, an attempt will be made to both - addresses, and the first successful connection attempt will result in a - successful check. This type of check should be preferred over a script that - uses `netcat` or another external process to check a simple socket operation. - By default, TCP checks will be configured with a request timeout of 10 seconds. - It is possible to configure a custom TCP check timeout value by specifying the - `timeout` field in the check definition. +### Script check ((#script-interval)) -- `UDP + Interval` - These checks direct the client to periodically send UDP datagrams - to the specified IP/hostname and port. The duration specified in the `interval` field sets the amount of time - between attempts, such as `30s` to indicate 30 seconds. The check is logged as healthy if any response from the UDP server is received. Any other result sets the status to `critical`. - The default interval for, UDP checks is `10s`, but you can configure a custom UDP check timeout value by specifying the - `timeout` field in the check definition. If any timeout on read exists, the check is still considered healthy. +Script checks periodically invoke an external application that performs the health check, +exits with an appropriate exit code, and potentially generates some output. +The specified `interval` determines the time between check invocations. +The output of a script check is limited to 4KB. +Larger outputs are truncated. -- `Time to Live (TTL)` ((#ttl)) - These checks retain their last known state - for a given TTL. The state of the check must be updated periodically over the HTTP - interface. If an external system fails to update the status within a given TTL, - the check is set to the failed state. This mechanism, conceptually similar to a - dead man's switch, relies on the application to directly report its health. For - example, a healthy app can periodically `PUT` a status update to the HTTP endpoint; - if the app fails, the TTL will expire and the health check enters a critical state. - The endpoints used to update health information for a given check are: [pass](/api-docs/agent/check#ttl-check-pass), - [warn](/api-docs/agent/check#ttl-check-warn), [fail](/api-docs/agent/check#ttl-check-fail), - and [update](/api-docs/agent/check#ttl-check-update). TTL checks also persist their - last known status to disk. This allows the Consul agent to restore the last known - status of the check across restarts. Persisted check status is valid through the - end of the TTL from the time of the last check. +By default, script checks are configured with a timeout equal to 30 seconds. +To configure a custom script check timeout value, +specify the `timeout` field in the check definition. +After reaching the timeout on a Windows system, +Consul waits for any child processes spawned by the script to finish. +After reaching the timeout on other systems, +Consul attempts to force-kill the script and any child processes it spawned. -- `Docker + Interval` - These checks depend on invoking an external application which - is packaged within a Docker Container. The application is triggered within the running - container via the Docker Exec API. We expect that the Consul agent user has access - to either the Docker HTTP API or the unix socket. Consul uses `$DOCKER_HOST` to - determine the Docker API endpoint. The application is expected to run, perform a health - check of the service running inside the container, and exit with an appropriate exit code. - The check should be paired with an invocation interval. The shell on which the check - has to be performed is configurable which makes it possible to run containers which - have different shells on the same host. Check output for Docker is limited to - 4KB. Any output larger than this will be truncated. In Consul 0.9.0 and later, the agent - must be configured with [`enable_script_checks`](/docs/agent/config/cli-flags#_enable_script_checks) - set to `true` in order to enable Docker health checks. +Script checks are not enabled by default. +To enable a Consul agent to perform script checks, +use one of the following agent configuration options: -- `gRPC + Interval` - These checks are intended for applications that support the standard - [gRPC health checking protocol](https://github.com/grpc/grpc/blob/master/doc/health-checking.md). - The state of the check will be updated by probing the configured endpoint, waiting `interval` - amount of time between probes (eg. 30 seconds). By default, gRPC checks will be configured - with a default timeout of 10 seconds. - It is possible to configure a custom timeout value by specifying the `timeout` field in - the check definition. gRPC checks will default to not using TLS, but TLS can be enabled by - setting `grpc_use_tls` in the check definition. If TLS is enabled, then by default, a valid - TLS certificate is expected. Certificate verification can be turned off by setting the - `tls_skip_verify` field to `true` in the check definition. - To check on a specific service instead of the whole gRPC server, add the service identifier after the `gRPC` check's endpoint in the following format `/:service_identifier`. +- [`enable_local_script_checks`](/docs/agent/config/cli-flags#_enable_local_script_checks): + Enable script checks defined in local config files. + Script checks registered using the HTTP API are not allowed. +- [`enable_script_checks`](/docs/agent/config/cli-flags#_enable_script_checks): + Enable script checks no matter how they are registered. -- `H2ping + Interval` - These checks test an endpoint that uses http2 - by connecting to the endpoint and sending a ping frame. TLS is assumed to be configured by default. - To disable TLS and use h2c, set `h2ping_use_tls` to `false`. If the ping is successful - within a specified timeout, then the check is updated as passing. - The timeout defaults to 10 seconds, but is configurable using the `timeout` field. If TLS is enabled a valid - certificate is required, unless `tls_skip_verify` is set to `true`. - The check will be run on the interval specified by the `interval` field. + ~> **Security Warning:** + Enabling non-local script checks in some configurations may introduce + a remote execution vulnerability known to be targeted by malware. + We strongly recommend `enable_local_script_checks` instead. + For more information, refer to + [this blog post](https://www.hashicorp.com/blog/protecting-consul-from-rce-risk-in-specific-configurations). -- `Alias` - These checks alias the health state of another registered - node or service. The state of the check will be updated asynchronously, but is - nearly instant. For aliased services on the same agent, the local state is monitored - and no additional network resources are consumed. For other services and nodes, - the check maintains a blocking query over the agent's connection with a current - server and allows stale requests. If there are any errors in watching the aliased - node or service, the check state will be critical. For the blocking query, the - check will use the ACL token set on the service or check definition or otherwise - will fall back to the default ACL token set with the agent (`acl_token`). - -## Check Definition - -A script check: +The following service definition file snippet is an example +of a script check definition: @@ -162,7 +90,6 @@ check = { interval = "10s" timeout = "1s" } - ``` ```json @@ -179,7 +106,47 @@ check = { -A HTTP check: +#### Check script conventions + +A check script's exit code is used to determine the health check status: + +- Exit code 0 - Check is passing +- Exit code 1 - Check is warning +- Any other code - Check is failing + +Any output of the script is captured and made available in the +`Output` field of checks included in HTTP API responses, +as in this example from the [local service health endpoint](/api-docs/agent/service#by-name-json). + +### HTTP check ((#http-interval)) + +HTTP checks periodically make an HTTP `GET` request to the specified URL, +waiting the specified `interval` amount of time between requests. +The status of the service depends on the HTTP response code: any `2xx` code is +considered passing, a `429 Too ManyRequests` is a warning, and anything else is +a failure. This type of check +should be preferred over a script that uses `curl` or another external process +to check a simple HTTP operation. By default, HTTP checks are `GET` requests +unless the `method` field specifies a different method. Additional request +headers can be set through the `header` field which is a map of lists of +strings, such as `{"x-foo": ["bar", "baz"]}`. + +By default, HTTP checks are configured with a request timeout equal to 10 seconds. +To configure a custom HTTP check timeout value, +specify the `timeout` field in the check definition. +The output of an HTTP check is limited to approximately 4KB. +Larger outputs are truncated. +HTTP checks also support TLS. By default, a valid TLS certificate is expected. +Certificate verification can be turned off by setting the `tls_skip_verify` +field to `true` in the check definition. When using TLS, the SNI is implicitly +determined from the URL if it uses a hostname instead of an IP address. +You can explicitly set the SNI value by setting `tls_server_name`. + +Consul follows HTTP redirects by default. +To disable redirects, set the `disable_redirects` field to `true`. + +The following service definition file snippet is an example +of an HTTP check definition: @@ -220,7 +187,23 @@ check = { -A TCP check: +### TCP check ((#tcp-interval)) + +TCP checks periodically make a TCP connection attempt to the specified IP/hostname and port, waiting `interval` amount of time between attempts. +If no hostname is specified, it defaults to "localhost". +The health check status is `success` if the target host accepts the connection attempt, +otherwise the status is `critical`. In the case of a hostname that +resolves to both IPv4 and IPv6 addresses, an attempt is made to both +addresses, and the first successful connection attempt results in a +successful check. This type of check should be preferred over a script that +uses `netcat` or another external process to check a simple socket operation. + +By default, TCP checks are configured with a request timeout equal to 10 seconds. +To configure a custom TCP check timeout value, +specify the `timeout` field in the check definition. + +The following service definition file snippet is an example +of a TCP check definition: @@ -232,7 +215,6 @@ check = { interval = "10s" timeout = "1s" } - ``` ```json @@ -249,7 +231,21 @@ check = { -A UDP check: +### UDP check ((#udp-interval)) + +UDP checks periodically direct the Consul agent to send UDP datagrams +to the specified IP/hostname and port, +waiting `interval` amount of time between attempts. +The check status is set to `success` if any response is received from the targeted UDP server. +Any other result sets the status to `critical`. + +By default, UDP checks are configured with a request timeout equal to 10 seconds. +To configure a custom UDP check timeout value, +specify the `timeout` field in the check definition. +If any timeout on read exists, the check is still considered healthy. + +The following service definition file snippet is an example +of a UDP check definition: @@ -261,7 +257,6 @@ check = { interval = "10s" timeout = "1s" } - ``` ```json @@ -278,7 +273,32 @@ check = { -A TTL check: +### Time to live (TTL) check ((#ttl)) + +TTL checks retain their last known state for the specified `ttl` duration. +If the `ttl` duration elapses before a new check update +is provided over the HTTP interface, +the check is set to `critical` state. + +This mechanism relies on the application to directly report its health. +For example, a healthy app can periodically `PUT` a status update to the HTTP endpoint. +Then, if the app is disrupted and unable to perform this update +before the TTL expires, the health check enters the `critical` state. +The endpoints used to update health information for a given check are: [pass](/api-docs/agent/check#ttl-check-pass), +[warn](/api-docs/agent/check#ttl-check-warn), [fail](/api-docs/agent/check#ttl-check-fail), +and [update](/api-docs/agent/check#ttl-check-update). TTL checks also persist their +last known status to disk. This persistence allows the Consul agent to restore the last known +status of the check across agent restarts. Persisted check status is valid through the +end of the TTL from the time of the last check. + +To manually mark a service unhealthy, +it is far more convenient to use the maintenance mode +[CLI command](/commands/maint) or +[HTTP API endpoint](/api-docs/agent#enable-maintenance-mode) +rather than a TTL health check with arbitrarily high `ttl`. + +The following service definition file snippet is an example +of a TTL check definition: @@ -304,7 +324,24 @@ check = { -A Docker check: +### Docker check ((#docker-interval)) + +These checks depend on periodically invoking an external application that +is packaged within a Docker Container. The application is triggered within the running +container through the Docker Exec API. We expect that the Consul agent user has access +to either the Docker HTTP API or the unix socket. Consul uses `$DOCKER_HOST` to +determine the Docker API endpoint. The application is expected to run, perform a health +check of the service running inside the container, and exit with an appropriate exit code. +The check should be paired with an invocation interval. The shell on which the check +has to be performed is configurable, making it possible to run containers which +have different shells on the same host. +The output of a Docker check is limited to 4KB. +Larger outputs are truncated. +The agent must be configured with [`enable_script_checks`](/docs/agent/config/cli-flags#_enable_script_checks) +set to `true` in order to enable Docker health checks. + +The following service definition file snippet is an example +of a Docker check definition: @@ -334,7 +371,26 @@ check = { -A gRPC check for the whole application: +### gRPC check ((##grpc-interval)) + +gRPC checks are intended for applications that support the standard +[gRPC health checking protocol](https://github.com/grpc/grpc/blob/master/doc/health-checking.md). +The state of the check will be updated by periodically probing the configured endpoint, +waiting `interval` amount of time between attempts. + +By default, gRPC checks are configured with a timeout equal to 10 seconds. +To configure a custom Docker check timeout value, +specify the `timeout` field in the check definition. + +gRPC checks default to not using TLS. +To enable TLS, set `grpc_use_tls` in the check definition. +If TLS is enabled, then by default, a valid TLS certificate is expected. +Certificate verification can be turned off by setting the +`tls_skip_verify` field to `true` in the check definition. +To check on a specific service instead of the whole gRPC server, add the service identifier after the `gRPC` check's endpoint in the following format `/:service_identifier`. + +The following service definition file snippet is an example +of a gRPC check for a whole application: @@ -362,7 +418,8 @@ check = { -A gRPC check for the specific `my_service` service: +The following service definition file snippet is an example +of a gRPC check for the specific `my_service` service @@ -390,7 +447,23 @@ check = { -A h2ping check: +### H2ping check ((#h2ping-interval)) + +H2ping checks test an endpoint that uses http2 by connecting to the endpoint +and sending a ping frame, waiting `interval` amount of time between attempts. +If the ping is successful within a specified timeout, +then the check status is set to `success`. + +By default, h2ping checks are configured with a request timeout equal to 10 seconds. +To configure a custom h2ping check timeout value, +specify the `timeout` field in the check definition. + +TLS is enabled by default. +To disable TLS and use h2c, set `h2ping_use_tls` to `false`. +If TLS is not disabled, a valid certificate is required unless `tls_skip_verify` is set to `true`. + +The following service definition file snippet is an example +of an h2ping check definition: @@ -418,7 +491,29 @@ check = { -An alias check for a local service: +### Alias check + +These checks alias the health state of another registered +node or service. The state of the check updates asynchronously, but is +nearly instant. For aliased services on the same agent, the local state is monitored +and no additional network resources are consumed. For other services and nodes, +the check maintains a blocking query over the agent's connection with a current +server and allows stale requests. If there are any errors in watching the aliased +node or service, the check state is set to `critical`. +For the blocking query, the check uses the ACL token set on the service or check definition. +If no ACL token is set in the service or check definition, +the blocking query uses the agent's default ACL token +([`acl.tokens.default`](/docs/agent/config/config-files#acl_tokens_default)). + +~> **Configuration info**: The alias check configuration expects the alias to be +registered on the same agent as the one you are aliasing. If the service is +not registered with the same agent, `"alias_node": ""` must also be +specified. When using `alias_node`, if no service is specified, the check will +alias the health of the node. If a service is specified, the check will alias +the specified service on this particular node. + +The following service definition file snippet is an example +of an alias check for a local service: @@ -440,72 +535,137 @@ check = { -~> Configuration info: The alias check configuration expects the alias to be -registered on the same agent as the one you are aliasing. If the service is -not registered with the same agent, `"alias_node": ""` must also be -specified. When using `alias_node`, if no service is specified, the check will -alias the health of the node. If a service is specified, the check will alias -the specified service on this particular node. +## Check definition -Each type of definition must include a `name` and may optionally provide an -`id` and `notes` field. The `id` must be unique per _agent_ otherwise only the -last defined check with that `id` will be registered. If the `id` is not set -and the check is embedded within a service definition a unique check id is -generated. Otherwise, `id` will be set to `name`. If names might conflict, -unique IDs should be provided. +This section covers some of the most common options for check definitions. +For a complete list of all check options, refer to the +[Register Check HTTP API endpoint documentation](/api-docs/agent/check#json-request-body-schema). -The `notes` field is opaque to Consul but can be used to provide a human-readable -description of the current state of the check. Similarly, an external process -updating a TTL check via the HTTP interface can set the `notes` value. +-> **Casing for check options:** + The correct casing for an option depends on whether the check is defined in + a service definition file or an HTTP API JSON request body. + For example, the option `deregister_critical_service_after` in a service + definition file is instead named `DeregisterCriticalServiceAfter` in an + HTTP API JSON request body. -Checks may also contain a `token` field to provide an ACL token. This token is -used for any interaction with the catalog for the check, including -[anti-entropy syncs](/docs/architecture/anti-entropy) and deregistration. -For Alias checks, this token is used if a remote blocking query is necessary -to watch the state of the aliased node or service. +#### General options -Script, TCP, UDP, HTTP, Docker, and gRPC checks must include an `interval` field. This -field is parsed by Go's `time` package, and has the following -[formatting specification](https://golang.org/pkg/time/#ParseDuration): +- `name` `(string: )` - Specifies the name of the check. -> A duration string is a possibly signed sequence of decimal numbers, each with -> optional fraction and a unit suffix, such as "300ms", "-1.5h" or "2h45m". -> Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h". +- `id` `(string: "")` - Specifies a unique ID for this check on this node. + + If unspecified, Consul defines the check id by: + - If the check definition is embedded within a service definition file, + a unique check id is auto-generated. + - Otherwise, the `id` is set to the value of `name`. + If names might conflict, you must provide unique IDs to avoid + overwriting existing checks with the same id on this node. -In Consul 0.7 and later, checks that are associated with a service may also contain -an optional `deregister_critical_service_after` field, which is a timeout in the -same Go time format as `interval` and `ttl`. If a check is in the critical state -for more than this configured value, then its associated service (and all of its -associated checks) will automatically be deregistered. The minimum timeout is 1 -minute, and the process that reaps critical services runs every 30 seconds, so it -may take slightly longer than the configured timeout to trigger the deregistration. -This should generally be configured with a timeout that's much, much longer than -any expected recoverable outage for the given service. +- `interval` `(string: )` - Specifies + the frequency at which to run this check. + Required for all check types except TTL and alias checks. -To configure a check, either provide it as a `-config-file` option to the -agent or place it inside the `-config-dir` of the agent. The file must -end in a ".json" or ".hcl" extension to be loaded by Consul. Check definitions -can also be updated by sending a `SIGHUP` to the agent. Alternatively, the -check can be registered dynamically using the [HTTP API](/api). + The value is parsed by Go's `time` package, and has the following + [formatting specification](https://golang.org/pkg/time/#ParseDuration): -## Check Scripts + > A duration string is a possibly signed sequence of decimal numbers, each with + > optional fraction and a unit suffix, such as "300ms", "-1.5h" or "2h45m". + > Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h". -A check script is generally free to do anything to determine the status -of the check. The only limitations placed are that the exit codes must obey -this convention: +- `service_id` `(string: )` - Specifies + the ID of a service instance to associate this check with. + That service instance must be on this node. + If not specified, this check is treated as a node-level check. + For more information, refer to the + [service-bound checks](#service-bound-checks) section. -- Exit code 0 - Check is passing -- Exit code 1 - Check is warning -- Any other code - Check is failing +- `status` `(string: "")` - Specifies the initial status of the health check as + "critical" (default), "warning", or "passing". For more details, refer to + the [initial health check status](#initial-health-check-status) section. + + -> **Health defaults to critical:** If health status it not initially specified, + it defaults to "critical" to protect against including a service + in discovery results before it is ready. -This is the only convention that Consul depends on. Any output of the script -will be captured and stored in the `output` field. +- `deregister_critical_service_after` `(string: "")` - If specified, + the associated service and all its checks are deregistered + after this check is in the critical state for more than the specified value. + The value has the same formatting specification as the [`interval`](#interval) field. -In Consul 0.9.0 and later, the agent must be configured with -[`enable_script_checks`](/docs/agent/config/cli-flags#_enable_script_checks) set to `true` -in order to enable script checks. + The minimum timeout is 1 minute, + and the process that reaps critical services runs every 30 seconds, + so it may take slightly longer than the configured timeout to trigger the deregistration. + This field should generally be configured with a timeout that's significantly longer than + any expected recoverable outage for the given service. -## Initial Health Check Status +- `notes` `(string: "")` - Provides a human-readable description of the check. + This field is opaque to Consul and can be used however is useful to the user. + For example, it could be used to describe the current state of the check. + +- `token` `(string: "")` - Specifies an ACL token used for any interaction + with the catalog for the check, including + [anti-entropy syncs](/docs/architecture/anti-entropy) and deregistration. + + For alias checks, this token is used if a remote blocking query is necessary to watch the state of the aliased node or service. + +#### Success/failures before passing/warning/critical + +To prevent flapping health checks and limit the load they cause on the cluster, +a health check may be configured to become passing/warning/critical only after a +specified number of consecutive checks return as passing/critical. +The status does not transition states until the configured threshold is reached. + +- `success_before_passing` - Number of consecutive successful results required + before check status transitions to passing. Defaults to `0`. Added in Consul 1.7.0. + +- `failures_before_warning` - Number of consecutive unsuccessful results required + before check status transitions to warning. Defaults to the same value as that of + `failures_before_critical` to maintain the expected behavior of not changing the + status of service checks to `warning` before `critical` unless configured to do so. + Values higher than `failures_before_critical` are invalid. Added in Consul 1.11.0. + +- `failures_before_critical` - Number of consecutive unsuccessful results required + before check status transitions to critical. Defaults to `0`. Added in Consul 1.7.0. + +This feature is available for all check types except TTL and alias checks. +By default, both passing and critical thresholds are set to 0 so the check +status always reflects the last check result. + + + +```hcl +checks = [ + { + name = "HTTP TCP on port 80" + tcp = "localhost:80" + interval = "10s" + timeout = "1s" + success_before_passing = 3 + failures_before_warning = 1 + failures_before_critical = 3 + } +] +``` + +```json +{ + "checks": [ + { + "name": "HTTP TCP on port 80", + "tcp": "localhost:80", + "interval": "10s", + "timeout": "1s", + "success_before_passing": 3, + "failures_before_warning": 1, + "failures_before_critical": 3 + } + ] +} +``` + + + +## Initial health check status By default, when checks are registered against a Consul agent, the state is set immediately to "critical". This is useful to prevent services from being @@ -576,13 +736,13 @@ In the above configuration, if the web-app health check begins failing, it will only affect the availability of the web-app service. All other services provided by the node will remain unchanged. -## Agent Certificates for TLS Checks +## Agent certificates for TLS checks The [enable_agent_tls_for_checks](/docs/agent/config/config-files#enable_agent_tls_for_checks) agent configuration option can be utilized to have HTTP or gRPC health checks to use the agent's credentials when configured for TLS. -## Multiple Check Definitions +## Multiple check definitions Multiple check definitions can be defined using the `checks` (plural) key in your configuration file. @@ -640,58 +800,3 @@ checks = [ ``` - -## Success/Failures before passing/warning/critical - -To prevent flapping health checks, and limit the load they cause on the cluster, -a health check may be configured to become passing/warning/critical only after a -specified number of consecutive checks return passing/critical. -The status will not transition states until the configured threshold is reached. - -- `success_before_passing` - Number of consecutive successful results required - before check status transitions to passing. Defaults to `0`. Added in Consul 1.7.0. -- `failures_before_warning` - Number of consecutive unsuccessful results required - before check status transitions to warning. Defaults to the same value as that of - `failures_before_critical` to maintain the expected behavior of not changing the - status of service checks to `warning` before `critical` unless configured to do so. - Values higher than `failures_before_critical` are invalid. Added in Consul 1.11.0. -- `failures_before_critical` - Number of consecutive unsuccessful results required - before check status transitions to critical. Defaults to `0`. Added in Consul 1.7.0. - -This feature is available for HTTP, TCP, gRPC, Docker & Monitor checks. -By default, both passing and critical thresholds will be set to 0 so the check -status will always reflect the last check result. - - - -```hcl -checks = [ - { - name = "HTTP TCP on port 80" - tcp = "localhost:80" - interval = "10s" - timeout = "1s" - success_before_passing = 3 - failures_before_warning = 1 - failures_before_critical = 3 - } -] -``` - -```json -{ - "checks": [ - { - "name": "HTTP TCP on port 80", - "tcp": "localhost:80", - "interval": "10s", - "timeout": "1s", - "success_before_passing": 3, - "failures_before_warning": 1, - "failures_before_critical": 3 - } - ] -} -``` - -