One of the key characteristics of Kubernetes is scale. There are several experiments that show how one can scale to 1000s of nodes and tens of thousands of containers. Indeed, one of the key benefits of transitioning to microservices is the ability to scale specific parts of an application or modifying parts of an application without updating a big monolith.
The challenge that we will discuss in this blog though, is how do you scale down a microservices based system that has been build with the goals of supporting thousands of customers. We have one particular use case where we faced this challenge.
We have been building the Aporeto service as a set of microservices, deployed as containers in a Kubernetes cluster. This includes both our stateless
services as well as data stores. From the get-go we decided to offer the Aporeto product as a service and since we needed scale and operational simplicity
Kubernetes was the top choice. As a startup with limited amount of resources, we chose to host our Kubernetes cluster
in Google’s GKE. Let Google manage the cluster, and we can focus on our applications. Easy choice.
However, as we discussed with more customers it became apparent that since we are offering a security solution, some of them were uncomfortable with the idea of a service running in a public cloud and they wanted to host the same service in their own environment. The question we had to answer was simple,
“How do we scale down a microservices architecture designed for scale?”
We have architected a service designed to handle thousands of hosts and tens of thousands of workloads at very high speeds. We used Kubernetes to solve several operational problems. We now had to scale it down and make it deployable in three or five nodes that a non-expert can install in 10 minutes or less. And the installation must succeed 100% of the time in any virtualization or network environment without a DevOps engineer or a network engineer to overlook every aspect of the deployment.
We started evaluating several alternatives. We could re-design some parts of the software and make the microservices a monolith. We could use just
docker or docker-compose and maintain the microservices/container model, but avoid the complexities of Kubernetes for a small environment. After all,
several people would argue that going to Docker/Swarm cluster would be ideal for these small environments. Kelsey Hightower even has an option for just running kubelet without the Kubernetes control plane.
All of the above would result in a different user experience and more importantly a maintenance nightmare for managing different code bases and
operational models. We have invested lots of time automating the infrastructure deployment and dealing with high-availability issues and we relied on Kubernetes capabilities and APIs.
The alternative would be to deploy a small Kubernetes cluster for our customers. The requirement though was that in addition to maintaining our services we would have to pre-build and package a Kubernetes deployment that is easy to install and maintain by someone that is not a Kubernetes expert. This sounded more plausible but it had its own challenges.
We started considering what are the biggest benefits we would get from a small scale Kubernetes deployment and what were the biggest challenges.
First on the benefits, the things that we use and love from Kubernetes:
Even in a small, scaled-down environment the above are very valuable and we have saved tons of work by relying on Kubernetes rather than re-inventing the
wheel ourselves. Essentially, Kubernetes for us is not just a scheduler. But an “application operating system”.
What is the biggest challenge in a small scale Kubernetes deployment though? The answer would not surprise anyone that has talked to Kubernetes users. Networking is complex. If we want just a cluster with three or five nodes and 10-20 microservices, we need to deploy tunnels, SDN controllers, manage IP address subnets, deal with service networks and pod networks, etc. And none of these things will ever go wrong. Right?
Our requirement is to deploy this cluster in any customer environment with any possible networking architecture or any virtualization technology, and we need to make sure that it works at predictable performance and un-attended. In some instances, it might run on top of other virtualization solutions that use SDN systems and VXLAN tunnels of their own (example running on top of VMware with NSX) and the last thing we need is a massive network architecture with advertising BGP routes around the data center to achieve our goals. Imagine the customer reaction of messing around with their network in order to deploy a simple security application. We also needed the cluster to work the exact same way in AWS and GCP, a private data center or a even an Openstack cluster. And from a performance perspective since we process lots of data it should not be affected by tunneling methods that remove TCP offload capabilities from our applications.
At this point we realized that we could actually get the best of both worlds. In our microservices architecture we have already implemented a service discovery mechanism completely decoupled from any cloud provider or network architecture. One can argue that we re-invented the wheel there, but we had some custom needs that an of-the-shelf service discovery framework was too difficult to use. The benefit of this design choice was that we did not have any
dependencies on a networking technology and we did not make any assumptions about requiring microservices to be addressed by separate IP addresses. Ideally, if Kubernetes had an option without the requirement of IP per Pod and by using port mappings and network address translations similar to Docker, we could use this facility. But unfortunately, there is no such easy option.
Given the above characteristics we decided to run our containers in the host namespace in this limited environment. This is kind of anti-pattern for a Kubernetes deployment since all PODs are now in the same network namespace. In this scale-down deployment they all get the same IP address, but obviously use different ports. However, we need no overlay or any other network architecture change. Such a cluster works in any infrastructure, whether virtualized or bare metal. It is easy to maintain and it does not introduce any operational friction. And, it let us get rid of a huge operational complexity. From an operations perspective, deploying this system would be no different than deploying a set of hosts in a users infrastructure that run an old style monolith.
In other words, by simply switching to host networks we can keep the Kubernetes features that we love: Robust scheduling, high availability, and dynamic deployment of services. And, we could get rid of unnecessary complexities.
But what about security? By using host networks in our deployment we have no network stack isolation between the microservices. So, how do we address security? The good news is that this is a restricted environment with no multi-tenancy and we can make sure that proper capabilities and security profiles are propagated to all the containers. But we still need to achieve network security without relying on network structures.
The technology we have designed in Aporeto is based on the concept of decoupling security from the network. The Aporeto isolation mechanisms do not care
if containers are run in the same network or across multiple availability zones. They do not depend on a specific SDN plugin or any network assumptions and they support equally well both containers and Linux services.
The answer to our security problem was to run Aporeto on top of Aporeto.
As we have explained in a previous blog the Trireme architecture supports equally well containers running over a bridge or IPVLAN/MacVLAN or on a host network. We could therefore use Aporeto itself to protect the Aporeto services.
We used the Trireme Kubernetes plugin that does not require any control plane other than basic Kubernetes and deployed it in our small cluster. We used
basic network policies in Kubernetes to isolate our components, while Trireme Kubernetes was providing the enforcement mechanism even on top of host networks.
We now had a highly secure cluster with very granular policy, without the complexities of SDN systems, tunnels and IP addresses. The Trireme daemonset managed by Kubernetes itself and assured high-availability, restarts, updates and so on.
So, yes, the answer was just “Turtles all the way down”
A big complexity in any distributed system deployment ends up being the network. The Kubernetes design has gone through a lot of effort to decouple applications from the network by introducing the concept of network plugins and adopting CNI. The design choice of a single IP per container is mainly driven by the need to support a simple service discovery mechanism that does not depend on random ports (plain DNS) and minimizes the complexities of port management. This design choice introduces complexity that although needed for scale, it becomes a challenge when one tries to scale down and simplify a deployment.
The reality is though that for applications that are designed in a new microservices pattern, service discovery is a key function that is often much more
complex rather than a simple DNS lookup. Rolling upgrades, canary deployments, and circuit breaking cannot be implemented with just a DNS lookup. We realized through this process that the complexity of networking is an anti-pattern that is carried over to support legacy environments with classic DNS based service discovery mechanisms.
Our thesis is that when designing new microservices architectures one can easily get rid of this complexity, by using some simple steps:
As always there are several trade-offs to consider and there are performance implications everywhere.