diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md index 2a3a86c244..31511c6595 100644 --- a/.github/CONTRIBUTING.md +++ b/.github/CONTRIBUTING.md @@ -135,3 +135,11 @@ Consul currently uses Go Modules for vendoring. Please only apply the minimal vendor changes to get your PR to work. Consul does not attempt to track the latest version for each dependency. + +## Checklists + +Some common changes that many PRs require such as adding config fields, are +documented through checklists. + +Please check in `contributing/` for any `checklist-*.md` files that might help +with your change. diff --git a/INTERNALS.md b/INTERNALS.md index cb1cbe4de3..896226cb9c 100644 --- a/INTERNALS.md +++ b/INTERNALS.md @@ -1,76 +1 @@ -# Consul Internals Guide - -This guide is intended to help folks who want to understand more about how Consul works from a code perspective, or who are thinking about contributing to Consul. For a high level overview of Consul's design, please see the [Consul Architecture Guide](https://www.consul.io/docs/internals/architecture.html) as a starting point. - -## Architecture Overview - -Consul is designed around the concept of a [Consul Agent](https://www.consul.io/docs/agent/basics.html). The agent is deployed as a single Go binary and runs on every node in a cluster. - -A small subset of agents, usually 3 to 7, run in server mode and participate in the [Raft Consensus Protocol](https://www.consul.io/docs/internals/consensus.html). The Consul servers hold a consistent view of the state of the cluster, including the [service catalog](https://www.consul.io/api/catalog.html) and the [health state of services and nodes](https://www.consul.io/api/health.html) as well as other items like Consul's [key/value store contents](https://www.consul.io/api/kv.html). An agent in server mode is a superset of the client capabilities that follow. - -All the remaining agents in a cluster run in client mode. Applications on client nodes use their local agent in client mode to [register services](https://www.consul.io/api/agent.html) and to discover other services or interact with the key/value store. For the latter queries, the agent sends RPC requests internally to one of the Consul servers for the information. None of the key/value data is on any of the client agents, for example, it's always fetched on the fly from a Consul server. - -Both client and server mode agents participate in a [Gossip Protocol](https://www.consul.io/docs/internals/gossip.html) which provides two important mechanisms. First, it allows for agents to learn about all the other agents in the cluster, just by joining initially with a single existing member of the cluster. This allows clients to discover new Consul servers. Second, the gossip protocol provides a distributed failure detector, whereby the agents in the cluster randomly probe each other at regular intervals. Because of this failure detector, Consul can run health checks locally on each agent and just sent edge-triggered updates when the state of a health check changes, confident that if the agent dies altogether then the cluster will detect that. This makes Consul's health checking design very scaleable compared to centralized systems with a central polling type of design. - -There are many other aspects of Consul that are well-covered in Consul's [Internals Guides](https://www.consul.io/docs/internals/index.html). - -## Source Code Layout - -### Shared Components - -The components in this section are shared between Consul agents in client and server modes. - -| Directory | Contents | -| --------- | -------- | -| [command/agent](https://github.com/hashicorp/consul/tree/master/command/agent) | This contains the actual CLI command implementation for the `consul agent` command, which mostly just invokes an agent object from the `agent` package. | -| [agent](https://github.com/hashicorp/consul/tree/master/agent) | This is where the agent object is defined, and the top level `agent` package has all of the functionality that's common to both client and server agents. This includes things like service registration, the HTTP and DNS endpoints, watches, and top-level glue for health checks. | -| [agent/config](https://github.com/hashicorp/consul/tree/master/agent/config) | This has all the user-facing [configuration](https://www.consul.io/docs/agent/options.html) processing code, as well as the internal configuration structure that's used by the agent. | -| [agent/checks](https://github.com/hashicorp/consul/tree/master/agent/checks) | This has implementations for the different [health check types](https://www.consul.io/docs/agent/checks.html). | -| [agent/ae](https://github.com/hashicorp/consul/tree/master/agent/ae), [agent/local](https://github.com/hashicorp/consul/tree/master/agent/local) | These are used together to power the agent's [Anti-Entropy Sync Back](https://www.consul.io/docs/internals/anti-entropy.html) process to the Consul servers. | -| [agent/router](https://github.com/hashicorp/consul/tree/master/agent/router), [agent/pool](https://github.com/hashicorp/consul/tree/master/agent/pool) | These are used for routing RPC queries to Consul servers and for connection pooling. | -| [agent/structs](https://github.com/hashicorp/consul/tree/master/agent/structs) | This has definitions of all the internal RPC protocol request and response structures. | - -### Server Components - -The components in this section are only used by Consul servers. - -| Directory | Contents | -| --------- | -------- | -| [agent/consul](https://github.com/hashicorp/consul/tree/master/agent/consul) | This is where the Consul server object is defined, and the top-level `consul` package has all of the functionality that's used by server agents. This includes things like the internal RPC endpoints. | -| [agent/consul/fsm](https://github.com/hashicorp/consul/tree/master/agent/consul/fsm), [agent/consul/state](https://github.com/hashicorp/consul/tree/master/agent/consul/state) | These components make up Consul's finite state machine (updated by the Raft consensus algorithm) and backed by the state store (based on immutable radix trees). All updates of Consul's consistent state is handled by the finite state machine, and all read queries to the Consul servers are serviced by the state store's data structures. | -| [agent/consul/autopilot](https://github.com/hashicorp/consul/tree/master/agent/consul/autopilot) | This contains a package of functions that provide Consul's [Autopilot](https://www.consul.io/docs/guides/autopilot.html) features. | - -### Other Components - -There are several other top-level packages used internally by Consul as well as externally by other applications. - -| Directory | Contents | -| --------- | -------- | -| [acl](https://github.com/hashicorp/consul/tree/master/api) | This supports the underlying policy engine for Consul's [ACL](https://www.consul.io/docs/guides/acl.html) system. | -| [api](https://github.com/hashicorp/consul/tree/master/api) | This `api` package provides an official Go API client for Consul, which is also used by Consul's [CLI](https://www.consul.io/docs/commands/index.html) commands to communicate with the local Consul agent. | -| [command](https://github.com/hashicorp/consul/tree/master/command) | This contains a sub-package for each of Consul's [CLI](https://www.consul.io/docs/commands/index.html) command implementations. | -| [snapshot](https://github.com/hashicorp/consul/tree/master/snapshot) | This has implementation details for Consul's [snapshot archives](https://www.consul.io/api/snapshot.html). | -| [api/watch](https://github.com/hashicorp/consul/tree/master/api/watch) | This has implementation details for Consul's [watches](https://www.consul.io/docs/agent/watches.html), used both internally to Consul and by the [watch CLI command](https://www.consul.io/docs/commands/watch.html). | -| [website](https://github.com/hashicorp/consul/tree/master/website) | This has the full source code for [consul.io](https://www.consul.io/). Pull requests can update the source code and Consul's documentation all together. | - -## FAQ - -This section addresses some frequently asked questions about Consul's architecture. - -### How does eventually-consistent gossip relate to the Raft consensus protocol? - -When you query Consul for information about a service, such as via the [DNS interface](https://www.consul.io/docs/agent/dns.html), the agent will always make an internal RPC request to a Consul server that will query the consistent state store. Even though an agent might learn that another agent is down via gossip, that won't be reflected in service discovery until the current Raft leader server perceives that through gossip and updates the catalog using Raft. You can see an example of where these layers are plumbed together here - https://github.com/hashicorp/consul/blob/v1.0.5/agent/consul/leader.go#L559-L602. - -## Why does a blocking query sometimes return with identical results? - -Consul's [blocking queries](https://www.consul.io/api/index.html#blocking-queries) make a best-effort attempt to wait for new information, but they may return the same results as the initial query under some circumstances. First, queries are limited to 10 minutes max, so if they time out they will return. Second, due to Consul's prefix-based internal immutable radix tree indexing, there may be modifications to higher-level nodes in the radix tree that cause spurious wakeups. In particular, waiting on things that do not exist is not very efficient, but not very expensive for Consul to serve, so we opted to keep the code complexity low and not try to optimize for that case. You can see the common handler that implements the blocking query logic here - https://github.com/hashicorp/consul/blob/v1.0.5/agent/consul/rpc.go#L361-L439. For more on the immutable radix tree implementation, see https://github.com/hashicorp/go-immutable-radix/ and https://github.com/hashicorp/go-memdb, and the general support for "watches". - -### Do the client agents store any key/value entries? - -No. These are always fetched via an internal RPC request to a Consul server. The agent doesn't do any caching, and if you want to be able to fetch these values even if there's no cluster leader, then you can use a more relaxed [consistency mode](https://www.consul.io/api/index.html#consistency-modes). You can see an example where the `/v1/kv/` HTTP endpoint on the agent makes an internal RPC call here - https://github.com/hashicorp/consul/blob/v1.0.5/agent/kvs_endpoint.go#L56-L90. - -### I don't want to run a Consul agent on every node, can I just run servers with a load balancer in front? - -We strongly recommend running the Consul agent on each node in a cluster. Even the key/value store benefits from having agents on each node. For example, when you lock a key it's done through a [session](https://www.consul.io/docs/internals/sessions.html), which has a lifetime that's by default tied to the health of the agent as determined by Consul's gossip-based distributed failure detector. If the agent dies, the session will be released automatically, allowing some other process to quickly see that and obtain the lock without having to wait for an open-ended TTL to expire. If you are using Consul's service discovery features, the local agent runs the health checks for each service registered on that node and only needs to send edge-triggered updates to the Consul servers (because gossip will determine if the agent itself dies). Most attempts to avoid running an agent on each node will face solving issues that are already solved by Consul's design if the agent is deployed as intended. - -For cases where you really cannot run an agent alongside a service, such as for monitoring an [external service](https://www.consul.io/docs/guides/external.html), there's a companion project called the [Consul External Service Monitor](https://github.com/hashicorp/consul-esm) that may help. +Moved to [contributing/INTERNALS.md]. \ No newline at end of file diff --git a/contributing/INTERNALS.md b/contributing/INTERNALS.md new file mode 100644 index 0000000000..cb1cbe4de3 --- /dev/null +++ b/contributing/INTERNALS.md @@ -0,0 +1,76 @@ +# Consul Internals Guide + +This guide is intended to help folks who want to understand more about how Consul works from a code perspective, or who are thinking about contributing to Consul. For a high level overview of Consul's design, please see the [Consul Architecture Guide](https://www.consul.io/docs/internals/architecture.html) as a starting point. + +## Architecture Overview + +Consul is designed around the concept of a [Consul Agent](https://www.consul.io/docs/agent/basics.html). The agent is deployed as a single Go binary and runs on every node in a cluster. + +A small subset of agents, usually 3 to 7, run in server mode and participate in the [Raft Consensus Protocol](https://www.consul.io/docs/internals/consensus.html). The Consul servers hold a consistent view of the state of the cluster, including the [service catalog](https://www.consul.io/api/catalog.html) and the [health state of services and nodes](https://www.consul.io/api/health.html) as well as other items like Consul's [key/value store contents](https://www.consul.io/api/kv.html). An agent in server mode is a superset of the client capabilities that follow. + +All the remaining agents in a cluster run in client mode. Applications on client nodes use their local agent in client mode to [register services](https://www.consul.io/api/agent.html) and to discover other services or interact with the key/value store. For the latter queries, the agent sends RPC requests internally to one of the Consul servers for the information. None of the key/value data is on any of the client agents, for example, it's always fetched on the fly from a Consul server. + +Both client and server mode agents participate in a [Gossip Protocol](https://www.consul.io/docs/internals/gossip.html) which provides two important mechanisms. First, it allows for agents to learn about all the other agents in the cluster, just by joining initially with a single existing member of the cluster. This allows clients to discover new Consul servers. Second, the gossip protocol provides a distributed failure detector, whereby the agents in the cluster randomly probe each other at regular intervals. Because of this failure detector, Consul can run health checks locally on each agent and just sent edge-triggered updates when the state of a health check changes, confident that if the agent dies altogether then the cluster will detect that. This makes Consul's health checking design very scaleable compared to centralized systems with a central polling type of design. + +There are many other aspects of Consul that are well-covered in Consul's [Internals Guides](https://www.consul.io/docs/internals/index.html). + +## Source Code Layout + +### Shared Components + +The components in this section are shared between Consul agents in client and server modes. + +| Directory | Contents | +| --------- | -------- | +| [command/agent](https://github.com/hashicorp/consul/tree/master/command/agent) | This contains the actual CLI command implementation for the `consul agent` command, which mostly just invokes an agent object from the `agent` package. | +| [agent](https://github.com/hashicorp/consul/tree/master/agent) | This is where the agent object is defined, and the top level `agent` package has all of the functionality that's common to both client and server agents. This includes things like service registration, the HTTP and DNS endpoints, watches, and top-level glue for health checks. | +| [agent/config](https://github.com/hashicorp/consul/tree/master/agent/config) | This has all the user-facing [configuration](https://www.consul.io/docs/agent/options.html) processing code, as well as the internal configuration structure that's used by the agent. | +| [agent/checks](https://github.com/hashicorp/consul/tree/master/agent/checks) | This has implementations for the different [health check types](https://www.consul.io/docs/agent/checks.html). | +| [agent/ae](https://github.com/hashicorp/consul/tree/master/agent/ae), [agent/local](https://github.com/hashicorp/consul/tree/master/agent/local) | These are used together to power the agent's [Anti-Entropy Sync Back](https://www.consul.io/docs/internals/anti-entropy.html) process to the Consul servers. | +| [agent/router](https://github.com/hashicorp/consul/tree/master/agent/router), [agent/pool](https://github.com/hashicorp/consul/tree/master/agent/pool) | These are used for routing RPC queries to Consul servers and for connection pooling. | +| [agent/structs](https://github.com/hashicorp/consul/tree/master/agent/structs) | This has definitions of all the internal RPC protocol request and response structures. | + +### Server Components + +The components in this section are only used by Consul servers. + +| Directory | Contents | +| --------- | -------- | +| [agent/consul](https://github.com/hashicorp/consul/tree/master/agent/consul) | This is where the Consul server object is defined, and the top-level `consul` package has all of the functionality that's used by server agents. This includes things like the internal RPC endpoints. | +| [agent/consul/fsm](https://github.com/hashicorp/consul/tree/master/agent/consul/fsm), [agent/consul/state](https://github.com/hashicorp/consul/tree/master/agent/consul/state) | These components make up Consul's finite state machine (updated by the Raft consensus algorithm) and backed by the state store (based on immutable radix trees). All updates of Consul's consistent state is handled by the finite state machine, and all read queries to the Consul servers are serviced by the state store's data structures. | +| [agent/consul/autopilot](https://github.com/hashicorp/consul/tree/master/agent/consul/autopilot) | This contains a package of functions that provide Consul's [Autopilot](https://www.consul.io/docs/guides/autopilot.html) features. | + +### Other Components + +There are several other top-level packages used internally by Consul as well as externally by other applications. + +| Directory | Contents | +| --------- | -------- | +| [acl](https://github.com/hashicorp/consul/tree/master/api) | This supports the underlying policy engine for Consul's [ACL](https://www.consul.io/docs/guides/acl.html) system. | +| [api](https://github.com/hashicorp/consul/tree/master/api) | This `api` package provides an official Go API client for Consul, which is also used by Consul's [CLI](https://www.consul.io/docs/commands/index.html) commands to communicate with the local Consul agent. | +| [command](https://github.com/hashicorp/consul/tree/master/command) | This contains a sub-package for each of Consul's [CLI](https://www.consul.io/docs/commands/index.html) command implementations. | +| [snapshot](https://github.com/hashicorp/consul/tree/master/snapshot) | This has implementation details for Consul's [snapshot archives](https://www.consul.io/api/snapshot.html). | +| [api/watch](https://github.com/hashicorp/consul/tree/master/api/watch) | This has implementation details for Consul's [watches](https://www.consul.io/docs/agent/watches.html), used both internally to Consul and by the [watch CLI command](https://www.consul.io/docs/commands/watch.html). | +| [website](https://github.com/hashicorp/consul/tree/master/website) | This has the full source code for [consul.io](https://www.consul.io/). Pull requests can update the source code and Consul's documentation all together. | + +## FAQ + +This section addresses some frequently asked questions about Consul's architecture. + +### How does eventually-consistent gossip relate to the Raft consensus protocol? + +When you query Consul for information about a service, such as via the [DNS interface](https://www.consul.io/docs/agent/dns.html), the agent will always make an internal RPC request to a Consul server that will query the consistent state store. Even though an agent might learn that another agent is down via gossip, that won't be reflected in service discovery until the current Raft leader server perceives that through gossip and updates the catalog using Raft. You can see an example of where these layers are plumbed together here - https://github.com/hashicorp/consul/blob/v1.0.5/agent/consul/leader.go#L559-L602. + +## Why does a blocking query sometimes return with identical results? + +Consul's [blocking queries](https://www.consul.io/api/index.html#blocking-queries) make a best-effort attempt to wait for new information, but they may return the same results as the initial query under some circumstances. First, queries are limited to 10 minutes max, so if they time out they will return. Second, due to Consul's prefix-based internal immutable radix tree indexing, there may be modifications to higher-level nodes in the radix tree that cause spurious wakeups. In particular, waiting on things that do not exist is not very efficient, but not very expensive for Consul to serve, so we opted to keep the code complexity low and not try to optimize for that case. You can see the common handler that implements the blocking query logic here - https://github.com/hashicorp/consul/blob/v1.0.5/agent/consul/rpc.go#L361-L439. For more on the immutable radix tree implementation, see https://github.com/hashicorp/go-immutable-radix/ and https://github.com/hashicorp/go-memdb, and the general support for "watches". + +### Do the client agents store any key/value entries? + +No. These are always fetched via an internal RPC request to a Consul server. The agent doesn't do any caching, and if you want to be able to fetch these values even if there's no cluster leader, then you can use a more relaxed [consistency mode](https://www.consul.io/api/index.html#consistency-modes). You can see an example where the `/v1/kv/` HTTP endpoint on the agent makes an internal RPC call here - https://github.com/hashicorp/consul/blob/v1.0.5/agent/kvs_endpoint.go#L56-L90. + +### I don't want to run a Consul agent on every node, can I just run servers with a load balancer in front? + +We strongly recommend running the Consul agent on each node in a cluster. Even the key/value store benefits from having agents on each node. For example, when you lock a key it's done through a [session](https://www.consul.io/docs/internals/sessions.html), which has a lifetime that's by default tied to the health of the agent as determined by Consul's gossip-based distributed failure detector. If the agent dies, the session will be released automatically, allowing some other process to quickly see that and obtain the lock without having to wait for an open-ended TTL to expire. If you are using Consul's service discovery features, the local agent runs the health checks for each service registered on that node and only needs to send edge-triggered updates to the Consul servers (because gossip will determine if the agent itself dies). Most attempts to avoid running an agent on each node will face solving issues that are already solved by Consul's design if the agent is deployed as intended. + +For cases where you really cannot run an agent alongside a service, such as for monitoring an [external service](https://www.consul.io/docs/guides/external.html), there's a companion project called the [Consul External Service Monitor](https://github.com/hashicorp/consul-esm) that may help. diff --git a/contributing/README.md b/contributing/README.md new file mode 100644 index 0000000000..13894394a5 --- /dev/null +++ b/contributing/README.md @@ -0,0 +1,21 @@ +# Contributing to Consul + +See [our contributing guide](../.github/CONTRIBUTING.md) to get started. + +This directory contains other helpful resources like checklists for making +certain common types of changes. + +## Checklists + +See the `checklist-*` files in this dir to see of there is a checklist that +applies to a change you wish to make. The most common one is a change to +Consul's configuration file which has a lot more steps and subtlety than might +first appear so please use the checklist! + +We recommend copying the raw Markdown lists into a local file or gist while you +work and checking tasks as you complete them (by filling in `[ ]` with `[x]`). +You can then paste the list in the PR description when you post it indicating +the steps you've already done. If you want to post an initial draft PR before +the list is complete, please still include the list as is so reviewers can see +progress. You can then check the list off on the PR description directly in +GitHub's UI after pushing the relevant fixes up. \ No newline at end of file diff --git a/contributing/checklist-adding-config-fields.md b/contributing/checklist-adding-config-fields.md new file mode 100644 index 0000000000..42b2c45a25 --- /dev/null +++ b/contributing/checklist-adding-config-fields.md @@ -0,0 +1,206 @@ +# Adding a Consul Config Field + +This is a checklist of all the places you need to update when adding a new field +to config. There may be a few other special cases not included but this covers +the majority of configs. + +We suggest you copy the raw markdown into a gist or local file and check them +off as you go (you can mark them as done by replace `[ ]` with `[x]` so github +renders them as checked). Then **please include the completed lists you worked +through in your PR description**. + +Examples of special cases this doesn't cover are: + - If the config needs special treatment like a different default in `-dev` mode + or differences between OSS and Enterprise. + - If custom logic is needed to support backwards compatibility when changing + syntax or semantics of anything + +There are four specific cases covered with increasing complexity: + 1. adding a simple config field only used by client agents + 1. adding a CLI flag to mirror that config field + 1. adding a config field that needs to be used in Consul servers + 1. adding a field to the Service Definition + +## Adding a Simple Config Field for Client Agents + + - [ ] Add the field to the Config struct (or an appropriate sub-struct) in + `agent/config/config.go`. + - [ ] Add the field to the actual RuntimeConfig struct in + `agent/config/runtime.go`. + - [ ] Add an appropriate parser/setter in `agent/config/builder.go` to + translate. + - [ ] Add the new field with a random value to both the JSON and HCL blobs in + `TestFullConfig` in `agent/config/runtime_test.go`, it should fail now, then + add the same random value to the expected struct in that test so it passes + again. + - [ ] Add the new field and it's default value to `TestSanitize` in the same + file. (Running the test first gives you a nice diff which can save working + out where etc.) + - [ ] **If** your new config field needed some validation as it's only valid in + some cases or with some values (often true). + - [ ] Add validation to Validate in `agent/config/builder.go`. + - [ ] Add a test case to the table test `TestConfigFlagsAndEdgeCases` in + `agent/config/runtime_test.go`. + - [ ] **If** your new config field needs a non-zero-value default. + - [ ] Add that to `DefaultSource` in `agent/config/defaults.go`. + - [ ] Add a test case to the table test `TestConfigFlagsAndEdgeCases` in + `agent/config/runtime_test.go`. + - [ ] **If** your config should take effect on a reload/HUP. + - [ ] Add necessary code to to trigger a safe (locked or atomic) update to + any state the feature needs changing. This needs to be added to one or + more of the following places: + - `ReloadConfig` in `agent/agent.go` if it needs to affect the local + client state or another client agent component. + - `ReloadConfig` in `agent/consul/client.go` if it needs to affect + state for client agent's RPC client. + - [ ] Add a test to `agent/agent_test.go` similar to others with prefix + `TestAgent_reloadConfig*`. + - [ ] **If** the new config field(s) include an array of structs or maps. + - [ ] Add the path to the call to `lib.PatchSliceOfMaps` in Parse in + `agent/config/config.go`. + - [ ] If none of the tests in `agent/config/runtime_test.go` failed before you did that, + then you didn't actually test the slice part yet, go back and add tests + that populate that slice. + - [ ] Add documentation to `website/source/docs/agent/options.html.md`. + +Done! You can now use your new field in a client agent by accessing +`s.agent.Config.`. + +If you need a CLI flag, access to the variable in a Server context, or touched +the Service Definition, make sure you continue on to follow the appropriate +checklists below. + +## Adding a CLI Flag Corresponding to the new Field +If the config field also needs a CLI flag, then follow these steps. + + - [ ] Do all of the steps in [Adding a Simple Config + Field For Client Agents](#adding-a-simple-config-field-for-client-agents). + - [ ] Add the new flag to `agent/config/flags.go`. + - [ ] Add a test case to TestParseFlags in `agent/config/flag_test.go`. + - [ ] Add a test case (or extend one if appropriate) to the table test + `TestConfigFlagsAndEdgeCases` in `agent/config/runtime_test.go` to ensure setting the + flag works. + - [ ] Add flag (as well as config file) documentation to + `website/source/docs/agent/options.html.md`. + +## Adding a Simple Config Field for Servers +Consul servers have a separate Config struct for reasons. Note that Consul +server agents are actually also client agents, so in some cases config that is +only destined for servers doesn't need to follow this checklist provided it's +only needed during the bootstrapping of the server (which is done in code shared +by both server and client components in `agent.go`). For example WAN Gossip +configs are only valid on server agents but since WAN Gossip is setup in +`agent.go` they don't need to follow this checklist. The simplest (and mostly +accurate) rule is: + +> If you need to access the config field from code in `agent/consul` (e.g. RPC +> endpoints), then you need to follow this. If it's only in `agent` (e.g. HTTP +> endpoints or agent startup) you don't. + +A final word of warning - **you should never need to pass config into the FSM +(`agent/consul/fsm`) or state store (`agent/consul/state`)**. Doing so is **_very +dangerous_** and can violate consistency guarantees and corrupt databases. If +you think you need this then please discuss the design with the Consul team +before writing code! + +Consul's server components for historical reasons don't use the `RuntimeConfig` +struct they have their own struct called `Config` in `agent/consul/config.go`. + + - [ ] Do all of the steps in [Adding a Simple Config + Field For Client Agents](#adding-a-simple-config-field-for-client-agents). + - [ ] Add the new field to Config struct in `agent/consul/config.go` + - [ ] Add code to set the values from the `RuntimeConfig` in the confusingly + named `consulConfig` method in `agent/agent.go` + - [ ] **If needed**, add a test to `agent_test.go` if there is some non-trivial + behavior in the code you added in the previous step. We tend not to test + simple assignments from one to the other since these are typically caught by + higher-level tests of the actual functionality that matters but some examples + can be found prefixed with `TestAgent_consulConfig*` + - [ ] **If** your config should take effect on a reload/HUP + - [ ] Add necessary code to `ReloadConfig` in `agent/consul/server.go` this + needs to be adequately synchronized with any readers of the state being + updated. + - [ ] Add a new test or a new assertion to `TestServer_ReloadConfig` + +You can now access that field from `s.srv.config.` inside an RPC +handler. + +## Adding a New Field to Service Definition +The [Service Definition](https://www.consul.io/docs/agent/services.html) syntax +appears both in Consul config files but also in the `/v1/agent/service/register` +API. + +For wonderful historical reasons, our config files have always used `snake_case` +attribute names in both JSON and HCL (even before we supported HCL!!) while our +API uses `CamelCase`. + +Because we want documentation examples to work in both config files and API +bodies to avoid needless confusion, we have to accept both snake case and camel +case field names for the service definition. + +Finally, adding a field to the service definition implies adding the field to +several internal structs and to all API outputs that display services from the +catalog. That explains the multiple layers needed below. + +This list assumes a new field in the base service definition struct. Adding new +fields to health checks is similar but mostly needs `HealthCheck` structs and +methods updating instead. Adding fields to embedded structs like `ProxyConfig` +is largely the same pattern but may need different test methods etc. updating. + + - [ ] Do all of the steps in [Adding a Simple Config + Field For Client Agents](#adding-a-simple-config-field-for-client-agents). + - [ ] `agent/structs` package + - [ ] Add the field to `ServiceDefinition` (`service_definition.go`) + - [ ] Add the field to `NodeService` (`structs.go`) + - [ ] Add the field to `ServiceNode` (`structs.go`) + - [ ] Update `ServiceDefinition.ToNodeService` to translate the field + - [ ] Update `NodeService.ToServiceNode` to translate the field + - [ ] Update `ServiceNode.ToNodeService` to translate the field + - [ ] Update `TestStructs_ServiceNode_Conversions` + - [ ] Update `ServiceNode.PartialClone` + - [ ] Update `TestStructs_ServiceNode_PartialClone` (`structs_test.go`) + - [ ] If needed, update `NodeService.Validate` to ensure the field value is + reasonable + - [ ] Add test like `TestStructs_NodeService_Validate*` in + `structs_test.go` + - [ ] Add comparison in `NodeService.IsSame` + - [ ] Update `TestStructs_NodeService_IsSame` + - [ ] Add comparison in `ServiceNode.IsSameService` + - [ ] Update `TestStructs_ServiceNode_IsSameService` + - [ ] **If** your field name has MultipleWords, + - [ ] Add it to the `aux` inline struct in + `ServiceDefinition.UnmarshalJSON` (`service_defintion.go`). + - Note: if the field is embedded higher up in a nested struct, + follow the chain and update the necessary struct's `UnmarshalJSON` + method - you may need to add one if there are no other case + transformations being done, copy and existing example. + - Note: the tests that exercise this are in agent endpoint for + historical reasons (this is where the translation used to happen). + - [ ] `agent` package + - [ ] Update `testAgent_RegisterService` and/or add a new test to ensure + your fields register correctly via API (`agent_endpoint_test.go`) + - [ ] **If** your field name has MultipleWords, + - [ ] Update `testAgent_RegisterService_TranslateKeys` to include + examples with it set in `snake_case` and ensure it is parsed + correctly. Run this via `TestAgent_RegisterService_TranslateKeys` + (agent_endpoint_test.go). + - [ ] `api` package + - [ ] Add the field to `AgentService` (`agent.go`) + - [ ] Add/update an appropriate test in `agent_test.go` + - (Note you need to use `make test` or ensure the `consul` binary on + your `$PATH` is a build with your new field - usually `make dev` + ensures this unless you're path is funky or you have a consul binary + even further up the shell's `$PATH`). + - [ ] Docs + - [ ] Update docs in `website/source/docs/agent/services.html.md` + - [ ] Consider if it's worth adding examples to feature docs or API docs + that show the new field's usage. + +Note that although the new field will show up in the API output of +`/agent/services` , `/catalog/services` and `/health/services`, those tests +right now don't exercise anything that's super useful unless custom logic is +required since they don't even encode the response object as JSON and just +assert on the structs you already modified. If custom presentation logic is +needed, tests for these endpoints might be warranted too. It's usual to use +`omit-empty` for new fields that will typically not be used by existing +registrations although we don't currently test for that systematically.