A _service mesh_ provides benefits for all organziations, ranging from security to improved application resiliency.
Some of the benefits of a _service mesh_ include;
- service discovery
- application health monitoring
- load balancing
- automatic failover
- traffic management
- encryption
- observability and tracability,
- authentication and authorization,
- network automation
A common usecase for leveraging a _service mesh_ is to achieve a [_zero trust_ model](/use-cases/zero-trust-networking).
In a _zero trust_ model, applications require identity-based access to ensure all communication within the service mesh is authenticated with TLS certificates and encrypted in transit.
## How does a Service Mesh work?
A _service meshe_ typically consist of a control plane, and a data plane. The control plane maintains a central registry that keeps track of all services and their respective IP addresses, this is called _service discovery_.
As long as the application is registered with the control plane, the control plane will be able to share with other members of the mesh how to communicate with the application and enforce rules for who can communicate with each other.
The control plane is responsible for securing the mesh, facilitating service discovery, health checking, policy enforcement, and other similar operational concerns. The data plane handles communication between services.
Many _service mesh_ solutions employ a sidecar proxy to handle data plane communications, and thus limit the level of awareness the services need to have about the network environment.
An API gateway is a centralized access point for handling incoming client requests and delivering them to services.
The API Gateway acts as a control plane that allows operators and developers to manage incoming client requests and apply different handling logic depending on the request.
The API Gateway will route the incoming requests to the respective service. API Gateways primary function is to handle requests and return the reply from the service back to the client.
A _service mesh_ specializes in the network management of services and the communication between services.
The mesh is responsible for keeping track of services and their health status, IP address, traffic routing, and ensuring all the traffic between services are authenticated and encrypted.
Unlike API Gateways, a _service mesh_ will track all registered services' lifecycle and ensure requests are routed to healthy instances of the service.
API Gateways are frequently deployed alongside a loadbalancer to ensure traffic is directed to healthy and available instances of the service.
The mesh reduces the loadbalancer footprint as routing responsibilities are handled in a decentralized manner.
-> **Note**: API Gateways are frequently used to accept north-south based traffic. North-south traffic is networking traffic that either enters or exits a datacenter or a virutal private network (VPC).
A _service mesh_ is primarly used for handling east-west based traffic. East-west traffic traditionaly remains inside a datacenter or a VPC.
Modern infrastructure is transitioning from primarily being static-based to dynamic in nature (ephemeral).
This dynamic infrastructure has a short life cycle, meaning virtual machines (VM) and containers are frequently recycled.
It's difficult for an organization to manage and keep track of application services that live on short-lived resources. A _service mesh_ solves this problem by acting as a central registry of all registered services.
As service instances, either VMs or containers, come up and down, the mesh is aware of their state and availability. The ability to conduct _service discovery_ is the foundation to the other problems a _service mesh_ solves.
As a service mesh is aware of the state of a service and its instances, the mesh can implement more intelligent and dynamic network routing.
Many service meshes offer L7 traffic management capabilities. As a result, operators and developers can create powerful rules to direct network traffic as needed, such as load balancing, traffic splitting, dynamic failover, and custom resolvers.
A service mesh's dynamic network behavior allows application owners to improve application resiliency and availability with no application changes.
Implementing dynamic network behavior is critical as more and more applications are deployed across different cloud providers (multi-cloud) and private datacenters.
Organizations may need to route network traffic to other infrastructure environments. Ensuring this traffic is secure is on top of mind for all organizations.
Service meshes offer the ability to enforce network traffic encryption (mTLS) and authentication between all services. The _service mesh_ can automatically generate an SSL certificate for each service and its instances.
The certificate authenticates with other services inside the mesh and encrypts the TCP/UDP/gRPC connection with SSL.
<!-- As mentioned earlier, the _service mesh_ is aware of all services and their respective state and can require authentication between all service to service communication. -->
Fine-grained policies that dictate what services are allowed to communicate with each other is another benefit of a _service mesh_.
Traditionally, services are permitted to communicate with other services through firewall rules.
The traditional firewall (IP-based) model is difficult to enforce with dynamic infrastructure resources with a short lifecycle and frequently recycling IP addresses.
As a result, network administrators have to open up network ranges to permit network traffic between services without differentiating the services generating the network traffic. However, a _service mesh_ allows operators and developers to shift away from an IP-based model and focus more on service to service permissions.
An operator defines a policy that only allows _service A_ to communicate with _service B_. Otherwise, the default action is to deny the traffic.
This shift from an IP address-based security model to a service-focused model reduces the overhead of securing network traffic and allows an organization to take advantage of multi-cloud environments without sacrificing security due to complexity.