If you have either deployed or are planning to deploy a workload to the Cloud, perhaps using AWS, you are looking to run your operations efficiently without compromising security. In a recent post we discussed the AWS Shared Responsibility Model in which you are responsible for the security of your own data, platform, applications, and networks in the Cloud, while AWS is responsible for the security of the Cloud itself. Being security conscious, you understand this model and may have followed the AWS Security Best Practices in an effort to harden your EC2 instances.
Following these best practices, the server is definitely hardened and more secure, but risks still remain. Many of the hardening steps you have implemented are great at protecting from outside-in threats, but not necessarily threats originating from the inside, such as stolen operator credentials or malicious internal bad actors. Also, the integrity of the software you run can never be guaranteed, as demonstrated by vulnerabilities being discovered in 25-year-old Unix applications. The only way to start understanding and then managing these risks is to gain visibility. As your technical team knows, “If it moves, track it.”
The Elastic and Ephemeral Nature of the Cloud
While auto-scaling up and down has always been one of the interesting promises from virtual and Cloud infrastructure, a lot of the focus has been moved to immutable infrastructure. Tools like Docker have made these techniques much easier for developers to work with and operators to deploy. Many believe that these techniques make security more difficult because more instances are deployed, algorithms are deploying instances instead of humans interacting with asset management tools, and direct control is weakened.
The reality, though, is that the elastic nature of the cloud makes security easier. This is because you are scaling like-instances. Those instances are either pre-baked with all the software and applications they need, or are running a Configuration Management tool like Chef that forces the system to conform to an expected configuration every 10 to 15 minutes. Therefore, it does not matter how many instances you are running — they should all look and act the same. This makes even more sense if you are using a Role Based Architecture where, for example, all of your web servers should look and act the same.
If you are pre-baking those instances in an effort to employ Immutable Infrastructure, then security gets even easier. Users should never (or rarely) be logging into those instances. Need to upgrade a piece of software? Build and deploy a new instance, replacing the old one. Need to restart a service? Restart the instance. Think malware has landed and is calling home to command and control? Burn the instance down and replace it. As the saying goes, “Treat your servers as cattle, not pets.”
One side effect of this methodology is that you need to track what is going on inside an instance and retain that record. The retention becomes critical to know what happened in an ephemeral instance that ran for only 12 hours two weeks ago. It is also helpful to understand how these behaviors have changed over time. For example, did Apache web server always run as root and spawn a shell, or is that new behavior? Having a reliable and complete history of how your workload has behaved over time will allow you to conduct a much more effective forensic investigation to determine whether you have been breached and to what extent.
Theft of Operator Credentials
Most recent compromises of Cloud services have involved the theft of credentials either to gain initial access or to pivot within the organization’s infrastructure. This is concerning because credentials are often stolen through phishing (which can be hard to detect) or laptop theft. The 2015 Verizon Data Breach Investigation Report shows that phishing remains the most common attack vector for penetrating enterprise networks as 40% of breaches in 2014 started with email attachments and 35% started with an emailed link. The recent breach of the U.S. Office of Personnel Management in which the stolen personal information on four million employees is expected to be employed in follow-on phishing attempts such as the recent attack on the Redstone Arsenal.
While there are many examples of personal information being found on stolen laptops that did not meet their organization’s encryption policies (which can be leveraged for more advanced phishing attacks), the other concern is any VPN or SSH keys found on operator laptops. Multi-Factory Authentication (MFA) typically protects against these scenarios, since the keys are “something you have” and must be combined with “something you know” like a password. Unfortunately this often breaks down with password reuse, unenforced password complexity policies, saved passwords on the stolen laptop, or stealing the password through follow-on phishing attacks supported by the information on the stolen laptop.
It is trivial to detect failed login attempts, a common compliance requirement. It is not trivial to detect whether a logged in user is who they say they are. While you might not be employing a mainframe time accounting system to notice a 75-cent discrepancy, if you are monitoring workload and user behavior then you can understand what commands are typically run and how. For example, why is Bob logging in from the office when you know he’s visiting customers on the other coast? Why is Alice logging into servers that aren’t associated with her project? Is Charlie touching AWS services like S3 and RDS when he’s only ever bootstrapped EC2 instances?
Regardless of how the actor was able to assume your operator’s identity, their behavior will stand out if you have a history of how operators and workloads behave.
Vulnerabilities, Looking at the Behavior
Server hardening procedures are commonly based upon known fixes for known software issues. Compliance requirements enforce this mindset by requiring you to monitor the CVE fire hose. An increasingly more likely scenario is being breached with a recently announced vulnerability. The window of exposure between the discovery of a vulnerability and the vendor’s public release of a fix is where problems occur, especially since most traditional security monitoring tools are based on attack signatures. This means that you are waiting for one vendor to release a fix and for another vendor to release a detection signature for your specific stack of software (OS, combination of software versions, etc.).
This is not specific to any kind of application. It does not matter whether you are dealing with the next Shellshock or Heartbleed where rock solid core software is vulnerable, or whether you are deployed in a cage with the latest Software Defined Data Center platform. There will be a window of an exploitable vulnerability. With this kind of surface area, much of which you may not have ownership of becomes untenable to manage all the different permutations of signatures and patches. Especially as the rate of change in environments increases, supported by tools that promote frictionless system releases.
This is why we believe that signatures for specific vulnerabilities is increasingly outdated. Rather we see far greater success with inspecting workload behavior to catch exploited vulnerabilities. In particular, we are looking at indicators of bad or unexpected behavior, such as running curl from a production server or a web server spawning a shell. These steps in the breach process are necessary for a bad actor to establish a foothold in your hardened server, allowing them to lever up into your environment to achieve their objective.
Wrapping it All Up
A lot of ground was covered, so let’s summarize a few points.
Techniques like auto-scaling and Immutable Infrastructure do not increase the risk in your environment. They can actually improve your posture, especially if you pay attention to monitoring their behaviors.
Stolen credentials and internal bad actors are often the largest blind spot for an organization because the breach occurs from the inside while most hardening efforts typically take an outside-in view.
Relying on attack signatures has become insufficient given the high rate of change in environments and can leave you in a window where a vulnerability has been reported but no signature has been published.
- AWS Security