3 Reasons Why The Host Rules Cloud IDS

To truly appreciate why companies like Threat Stack point to the Cloud as a watershed event in their corner of the software industry, one must push past the hype and worn platitudes about “the Cloud with a capital C.” The reality is that it is the side effects that have caused such a large impact, like cost of operation as a function of scaled purchasing power and the forcing of software-only solutions.

This has certainly been felt in intrusion detection systems (IDS). They have traditionally been deployed as network hardware devices enabled by access to the network infrastructure, but are struggling to find relevance in a world where the traditional network boundary no longer exists.

This software shift has forced innovation in how organizations collect data from their Cloud environments and analyze that data for threat indicators. Some organizations began to work on this problem before the Cloud became pervasive, but now everyone who wants to run their workload in the Cloud must contend with IDS in a software-only environment.

This means that the opportunity can be taken to reevaluate whether it makes sense to collect data from and about the network or the host. How you answer this question directly impacts the complexity of your solution and defines the types of questions you will be able to answer about your environment.

We’ll examine three core reasons why we at Threat Stack believe that host-based intrusion detection is best suited to the Cloud and software-defined environments.

#1 The Workload Holds The Truth

If everything is software, then it’s critical to include it in the way we do security monitoring. This means that net flows by themselves are no longer sufficiently valuable; they must be contextualized with the users and processes that participated in the communication. For example, is it useful to know that an SSH connection left your environment if you cannot trivially associate it with a system user?

This context is critical for forensics and basic alerting, but let’s push it further. Role-based deployments of infrastructure became increasingly popular in the Cloud because it is so trivial to provision and manage many small instances. Therefore, each unit should perform one discrete function, meaning that if a server alters its workload, its operators should be notified of a potential breach or misconfiguration. This is one reason why at Threat Stack we recommend grouping your policies based on roles that match your environment, allowing us to detect new processes, network activity, and file system changes across a whole set of servers.

Put another way, all your web servers should look and act the same, and your IDS should notify you if this is not the case. While a network-based approach may notice a change in the amount or type of traffic being emitted from your server, it will not be able to tell you the source in your workload that caused the abnormality. This means that it is up to you, the operator, to correlate host and network information, which meaningfully impacts your mean time to root cause.

There is also no guarantee that the network sensor would have even appreciated the workload change. It certainly could not have told you that a non-system user was accessing your database instead of your authorized workload.

#2 The Hidden Cost of Network IDS

Another fascinating side effect of Cloud computing is that network, or bandwidth, is the last finite resource in our environments. Theoretically there is an infinite amount of compute and storage that can be provisioned, and companies like Amazon and Google are incentivized to make as much of these resources available to you as they can. We have yet to see the same sorts of gains in Cloud service providers’ (CSP) networks though.

When combined with the aforementioned ease of provisioning many smaller instances, it is no wonder that the benefits of parallelization have outpaced the rate at which per-core speeds have increased. That is to say, horizontal scaling can produce greater processing power and efficiency than vertical scaling.

Why does this matter when discussing host versus network monitoring? Solutions that tap a network interface and send carbon copies of packets to an off-host analysis tool effectively cut your potential bandwidth in half. This might be okay if the analysis is of a high value, but the vast majority of that net flow data will not be valuable, which means that you are spending through the nose on a finite resource for little return.

This is especially costly if you are one of the many companies out there like Threat Stack working with large data sets, which would require you to disable network monitoring for those clusters. Most databases are already replicating data between hosts and many modern databases are performing these transactions inline across more than one host. Replaying those streams and doubling network traffic on your database servers is costly for both system performance and your CSP bill. Yet it is hard to imagine that you would not want your IDS protecting your databases.

This was not as much of a problem previously for network-based analysis because a vendor would sell you a piece of hardware that you would feed traffic to with port mirroring, allowing you to drastically decrease the impact on network performance. Hopefully it’s unnecessary for me to point out that this will not work in the Cloud as there is no loading dock address or remote hands.

Sadly, many vendors are attempting to replicate this model with “virtual appliances” and network-based solutions instead of embracing workload focused analysis at the host layer. “Cloud washing” network analysis is simply untenable given the way we think about and build our systems today.

#3 Concerns with Proxy Availability, Complexity & Security

Let us assume that you are okay paying the price to stream all or most of your traffic twice. One common technique to decrease the per-host penalty is to proxy all host-to-host network traffic through a set of servers that are running the network monitor or tap.

This set up heavily penalizes east-west network traffic (server-to-server), especially if you can imagine introducing an extra network hop to each RPC call. But only proxying north-south network traffic (external-to-internal) does not give you sufficient visibility since you must understand how your workloads interact with one another. Plus, as all of those articles from the 1990s and 2000s taught us, perimeter detection alone is not sufficient and you must deploy monitoring with a “defense in depth” mindset.

However you arrange your proxies, you really only get insight into the traffic if you can decrypt it to inspect the payload. Your north-south network traffic is almost certainly encrypted and I have seen more companies encrypt east-west network traffic on principal due to fear stemming from high profile disclosures. Decrypting that traffic costs compute and likely introduces more latency into the proxy’s network hop.

Not only does this put you in a position of managing many proxy servers to sustain high availability and throughput, but you also have to deploy certificates and private keys onto the proxies to decrypt the traffic. This could be a very risky proposition depending on where the proxy sits in your network architecture. This is especially true if you are decrypting north-south traffic on a NAT host, because now your private keys and certificates sit on a WAN routable host.

Host-based monitoring does not introduce any of these complexities. We go so far as to never inspect your network packets at Threat Stack, instead only monitoring the network connection table and associating those connections with processes in your system. This way we can tell you who or what your workload is talking to, and likely why, without having to penalize network performance, introduce new availability concerns, or potentially put private keys at risk.

From our and our customers’ experiences, the deployment of a host-based IDS is far simpler than a network-based IDS, which is itself a security feature.

Final Thoughts

If you look back over this article, you will notice that the majority of our comparison has been from the architectural and infrastructure point of view. This must be the starting point for new security products looking to secure Cloud deployments, as we have seen in other software verticals. Forklifting on-premises assets or Cloud washing concepts for off-premises deployments leads to confused or less impactful value.

Focusing on the workload and understanding the new architectural behaviors in the Cloud, such as the lack of network control and bandwidth availability, naturally leads to host-based intrusion detection systems.