One of the most widely deployed security practices is “segmentation”, or the process of separating end-points in different trust domains and controlling interactions between those domains through policy rules. Segmentation manages information flow between domains and therefore reduces the attack surface between them. In essence, it limits the “blast radius,” or the portion of application that can be directly affected if an attacker manages to penetrate one of the trust domains.
Segmentation was initially based on IP subnets and VLANs. Operations assumed a static associations between services and servers (or IP addresses) and by placing servers in different VLANs, administrators could enforce isolation between services. Firewalls were often deployed to enforce policy rules between VLANs.
Virtualization broke some of the basic assumptions of these implementations since it removed the static association of IP addresses to services. A new a set of proprietary and standard solutions (see IEEE VEB and VEPA) were developed to automate the mapping process of VM end-points to VLANs; firewalls struggled to keep up.
The cloud has further challenged the static assumptions of VLANs while imposing larger scaling requirements. Spanning VLANs across multiple racks quickly became the bottleneck. Simultaneously, concepts like security groups that AWS introduced were easy for developers to understand and expanded the segmentation model to a more granular level, where any set of hosts irrespective of IP subnets could be grouped together in different trust domains.
As a response to the cloud, the networking industry reinvented itself once more and borrowed concepts from VPNs by moving isolation to tunnels and creating micro-segmentation. GRE, VXLAN, and Geneve were invented to create virtual VLAN-type structures that could span multiple racks within or or across data centers without the routing and management complexities of L2 technologies. SDN controllers, protocols (VTEP, OVSDB, VPC Peering, etc.), and software gateways became necessary as the requirements of segmentation solutions increased. The term “micro-segmentation” was coined to show an ever larger number of segments within a larger network.
If we take a step back we can quickly see that all of these segmentation solutions are based on the same concept. They start with a fundamental assumption that services (or end-points) are identified by their IP addresses and potentially port numbers. Then, they introduce one level of indirection by mapping sets of end-points to an identifier (VLAN, VXLAN ID, MPLS label, etc. ) and they make policy decisions based on the identifier. Finally, they use a control plane to disseminate state about these associations and corresponding policy rules. In virtualization and cloud worlds, mapping of end-points to identifiers is programmed through orchestration tools. Secondary control planes optimize state distribution information for each virtual network.
The adoption of containers, micro-services, and serverless architectures once more challenges the assumptions behind these solutions. Applications are disaggregated to smaller and often ephemeral components that are launched based on events. Segmentation needs to account for the dynamic nature of these applications; troubleshooting tunnels and control planes is becoming increasingly complex. The right tools are still evolving, and the quick convergence of a network is an ever larger problem.
A new set of solutions (still at their infancy) appear to be challenging network virtualization benefits by removing the need for network-level segmentation. These solutions assume a simple, flat network, where every process or workload is still uniquely identified through its IP address and port numbers. They attempt to solve the segmentation problem by distributing ACLs to every host. The concept is quite simple: If there are two domains of workloads (call them A and B), the policy problem can be solved by installing an ACL on every server that hosts workload A with a rule that allows communication between A and B. Assuming a control and management plane that allows identification of workloads and distribution of state, segmentation problems are pretty much “solved.”
But, flattening the networks comes with its own penalties. Let us consider a simple scenario where workloads A and B are composed of 20 containers each and placed in 40 different hosts. A flat topology requires creating and maintaining 400 ACL rules. Each server that hosts a container of type A must have an ACL to allow connectivity to any of the containers of type B (20 per server). Now, if we are to consider a data center with 1000 workloads each with 20 containers, we quickly realize that there is the management nightmare of hundreds of thousands of ACLs.
There are two key issues at this juncture: First, there is a question of the convergence. Even if we are successful at distributing ACLs every time a workload is activated, when a new container is added to a target set, the control plane would need to update the ACL rules on all servers that host related containers. In application environments with ephemeral micro-services or event-based mechanisms (see AWS Lambda), where a process (container) is only instantiated to complete a specific task and terminates afterwards, the control plane will have to sustain an activation rate of several thousand ACLs per second. Second, on the performance front, introducing 1,000s of ACLs on every host has its own challenges since ACL lookup algorithms are essentially point location problems in a multi-dimensional space with significant per-packet complexity. Current Linux systems with ipsets give the illusion of simplicity with a space-time trade off, but still require several CPU cycles for every packet.
So, one has to ask the question: If network based segmentation (VLANs, network virtualization, etc.) create operational complexity and if flat networks with ACLs cannot scale, are there any alternatives?
We need to be pursing Trust-Centric Security; instead and for years, we have been pursuing networking solutions for a trust problem between domains instead of focusing on understanding the problem itself. The key “requirement” is to control information flow between the components (services) that form an application. Services need to be identified by their characteristics and not by their IP addresses that should only be used as location and routing identifiers. Policies need to control information exchange between services and not between IP addresses. We essentially need to solve an authentication and authorization problem and not a packet ACL problem. The mechanisms used to forward packets and organize the network for operational reasons do not need to be coupled with security policies; the two must evolve independently.