Live Demo
Blog   >   Cloud Security   >   Real World: How Machine Learning Modeling Works–ThreatML

Real World: How Machine Learning Modeling Works–ThreatML

As cybersecurity in a zero-trust era becomes increasingly important for organizations protecting customer data and business operations, mere anomaly detection and finding known threats is not enough. Rule sets that were effective even a few years ago in creating cyberattack alerts are increasingly outdated and gap-filled, especially as more organizations move their data and operations to the cloud. Machine learning modeling promised to help, but when?

To model, predict, and respond to those cloud-native cybersecurity gaps, Threat Stack recently launched ThreatML with supervised learning. As more customers embrace the high-efficacy threat detection offered by this new security capability, and as Threat Stack customers begin to work supervised learning into their daily security operations, we are getting more and more questions about real-world examples of where this type of detection can work best and how they can implement it into their own environments.

First, it’s important to understand that ThreatML combines rules with supervised learning to deliver threat detection with very high efficacy and very little human effort. It baselines, then predicts workload behaviors to automatically suppress uninteresting findings and highlight real, actionable threats, known and unknown.

High-Efficacy Cyber Alerts without Alert Fatigue

How does this happen: Threat Stack built a model that understands what behavior does and does not typically precede an event that matches rules we set. As a result, the model can essentially predict whether a rule match will occur. This allows for high-efficacy alerts to be generated, without false negatives or false positives causing alert fatigue.

More specifically, on a periodic basis, Threat Stack queries our data platform, looking for events that are labeled with a rule match from a machine learning-enabled rule.  When we find such an event, we analyze event data that preceded it.  We process those events using the model that we have trained for that rule.  For that window of event data, we ask the question: “What will happen next?” The answer to that question dictates whether we generate an alert.  We’re essentially predicting whether a rule match will occur.

ThreatML with supervised learning finds anomalies as well as unexpected behaviors which present threats.

Threat Stack's Supervised Machine Learning Modeling = high-efficacy alerts

/tmp Real-World Cybersecurity and Supervised Machine Learning Modeling Example

A common example we use is /tmp. If we have a managed rule in place to monitor processes running out of /tmp, our platform is labeling events that match the rule and landing them in our data lake.  Those events are training a model to understand what behavior normally precedes any process running out of /tmp.  One rule, one model.

To generate alerts from the model, we follow the workflow below:

  • Periodic query of the data platform finds an event that is labeled with an ML-enabled rule match.
    • It is a syscall event showing a process running out of /tmp.
  • Once we find that rule match, we analyze a window of data that came before it.
  • In analyzing that window, we ask: “What will happen?  Does this sequence of events usually result in a process running from /tmp?”
    • –We call this a “running inference.”
    • –If the answer is “yes,” we can assume it is not an event of interest.
    • –If the answer is “no,” we can assume it is abnormal, and therefore is an event of interest.
  • Threat Stack generates an alert.

More simply put:

We know for a fact that a process ran out of /tmp because it matched a rule. Our model tells us that a process should not have run out of /tmp, because we couldn’t predict it.  This process running out of /tmp is inconsistent with what typically happens in this workload, therefore is actionable and will generate an alert.

Using the above rule without machine learning would be extremely time consuming and not terribly effective.  We could try to add rule suppressions for normal processes, but that would be an extensive list that grows stale and doesn’t account for automated processes that have unique names.

ThreatML with supervised learning knows what’s normal for a workload, even if the process name is unique, and can respond accordingly by either ignoring the behavior or generating an appropriate, high-efficiency and actionable alert, in context.

Curl as Another Real World Cybersecurity Example

ThreatML can also help organizations avoid cyber attacks from legitimate utilities. A favorite command run example we often reference is “curl”.

Curl is a very common command run in cloud-native infrastructure. It’s a very popular utility for a variety of legitimate uses.

Could you detect the use of curl using a rule? Absolutely! You’d likely get thousands of rule matches, and thousands of alerts generated. But because curl is so common, that’s likely not a rule you would want to set, because it would lead to alert fatigue.

But what if an attacker is using curl to download and execute malware? It would be very hard to find the malicious use of curl in thousands of normal curl executions.

This is where ThreatML with supervised learning shines. It is able to learn what is normal use of curl and what is not. It can state with very high confidence: “I know curl ran. I did not predict curl would run on this workload, in this way. This is a behavior you should look at.”

Learn How ThreatML with Supervised Learning Increases Your Cybersecurity Profile

For more information, please visit To schedule a demonstration or consultation on how ThreatML with supervised learning can be a cybersecurity solution for your organization, help with data protection and security compliance, and more, visit our contact us page.