The Security Blast Radius in Cloud Native Applications

By: Dimitri Stiliadis 04.17.2018
The Security Blast Radius in Cloud Native Applications

This blog will explore the blast radius of an attack on cloud native applications.

The term ‘blast radius’ has often been used to describe the effect of a security breach to an application or organization. In the days of private data centers and perimeter security, the blast radius was often the perimeter of the data center. Once an attacker found a way inside the perimeter they would laterally move and amplify the attack across applications and networks. As technology became more mature and the concept of segmentation appeared, solutions emerged that attempted to use network boundaries to minimize the security blast radius. These techniques were viable and often achieved significant benefits.

The blast radius of an attack on cloud native applications has somewhat different implications. It has been well documented that it is unhelpful to think of a blast radius in these cloud native environments as defined by network boundaries since the weakest link is often the credentials and access management capabilities. We’ll start with a simple example of a cloud account (AWS, GCP, Azure).

IaaS and Multi-Account Techniques

AWS

It has been widely recommended that in order to minimize the security blast radius in an environment such as AWS, it is best to use multiple-accounts. The API keys that applications or users use provide a multitude of permissions within the environment. If an API key is lost and this key has significant privileges, it could easily be used to create, modify, or destroy resources even with an AWS VPC. The attacker could create an attacker VM inside the VPC, which means they could change firewall rules. So, using multiple accounts minimizes the blast radius of an API key being lost to your cloud native application.

GCP

GCP’s philosophy is a little different than AWS in regard to the concept of projects. A user can be provided access to a subset of the projects related to the account by the owner of the cloud native application, who can establish proper identity and access management rules. A user that needs access must authenticate remotely through the CLI and/or a web-browser, and then temporary credentials are stored in a file (default: ~/.config/gcloud). If this file is lost or copied by another user and moved to a different machine during an attack, the victim can easily access any project that this user has access to. Because of the single-sign-on capability and users requiring access to multiple accounts, the trick with multiple accounts that is often recommended in AWS does not really work. Users must pay extra attention to protecting these API access keys. This is especially dangerous with lost or stolen laptops.

Kubernetes Clusters

When considering the security blast radius for cloud native applications in Kubernetes clusters and Docker container, we can have several dimensions. As in the IaaS case there is the whole sequence of attacks with stolen API keys. However, in addition to protecting the API keys for accessing the cluster one has to consider the whole pipeline of application delivery and look at the blast radius of the different components:

• Container registry
• Container images
• Kubernetes credentials
• Container runtime

Let’s explore the risks in each of these dimensions:

Container Registry

Before we even consider the security of the cluster itself, we need to understand the security of the container registry. In order for users or systems to be able to push/update containers in the Docker registry they need credentials. Authentication and authorization in the registry is performed through OAUTH 2.0 tokens. These tokens have a limited expiration and they must be configured with the shortest possible parameters. However, if an attacker gets ahold of an OAUTH token for the registry, they can easily push an image update to the registry that is not necessarily seen. This image update can include spyware or coin-mining software that may be undetected for a long time.

Container Images

There has been a lot of discussion about vulnerability scanning in containers, both for private and public registries. It makes big headlines to disclose that a large number of images have vulnerabilities in the registry, and people tend to pay attention to this. Unfortunately, most vulnerability scanning is limited to very basic OS vulnerabilities. No vulnerability scanner will be able to identify a coin-mining daemon buried deep in the code of a popular application image. These types of attack techniques have already shown up inside popular frameworks and they can easily appear in popular images.

So, when it comes to minimizing damage by container images one has to first consider the attack surface of the registry itself and strictly control who can push image updates to the registry. Limiting this capability to a CI/CD pipeline might reduce the attack surface, but one has to carefully manage access credentials for the pipeline itself.

It’s turtles all the way down. Who gives credentials to the CI/CD pipeline, and it they are hardwired, the pipeline becomes a vulnerability itself.

Kubernetes Credentials

Same as with registry, IaaS platforms and so on, Kubernetes has the concept of user and service account that it boils down to a pretty static token. Any leak of the token, and a third party can get direct access to your Kubernetes cluster. Similar to gcloud, if you read these credentials in the clear from a default file (~/.kube/config). If a user loses these credentials, the attacker has the same access to the cluster as the original user. And because there is an automatic renewal process, the attacker will continue to have this access for a while. In other words, the attack is not just something that can be limited in time over 24 hours and until the credentials expire.

Container Runtime

Last but not least, let’s discuss the security blast radius of attacks on the container runtime of cloud native applications. Lots has been written about this, and we will not spend too much time on the topic. The bottom line is that with proper configuration one can minimize the attack surface. Using user namespaces, proper controls on seccomp profiles and so on.

It’s All About Identity Protection

As you can see in all the above examples, the security blast radius and attack surface in cloud deployments is not the boundary of the network. It is not a machine, a subnet, a VPC. The weakest link in all these attacks is actually identity and credentials for users and applications.

Without proper management of identities and access control in these platforms, a single attacker can create havoc without actually anything fancier than stealing the user credentials of finding an API key in Github. It is well known that Amazon AWS actually scans Github for API keys, but this sensitive information can leak in many different ways.

The problem is amplified with programmatic access to APIs that brings out the power of these platforms. This programmatic access becomes the weakest link.

What Should Users Do?

Most of these platforms provide very sophisticated mechanisms for controlling access to APIs through robust authorization mechanisms. They have to be rather weak on the token validity front since they do not want to compromise usability with the overhead of security. That is the main reason that the AWS access token is valid for a year. Addressing this problem requires a different way of thinking:

  • Access credentials cannot be static parameters, but they must accumulate context. This is a similar approach as the idea of using biometrics or voice recognition for user access. Applications and clients have as much of a “fingerprint” as the users voice.
  • Managing access management requires a simplification of the framework. The number of reports, questions and mistakes on how to manage RBAC in Kubernetes or how to manage IAM in AWS is just overwhelming. The number one reason for this is that policy is complex. It is essentially a quadratic problem where someone has to whitelist every combination of allowed accesses. Unless we find more efficient mechanisms for expressing policy, we will be doomed with errors.

Interested in securing microservices and containers? Read our Microservices Security: Best Practices white paper to go deeper.

Recent Posts Simple by design; Automating per-namespace isolation with Aporeto and OpenShift Five Things to Check Out at VMworld 2019 and Visit While in San Francisco The Evolution of the Serverless Era (and redefining security to keep up)