How Data Science Can Save You From a Heuristics Headache

With the over-hype of AI, it’s tough to blame people for thinking that they might be able to achieve a similar outcome using rules or basic statistics (the folks you should really blame are the marketing people!) … That being said, in this blog post I’m going to explain why these simple heuristics will pale in comparison to what a proper data science-fueled machine learning algorithm can do.

What do we mean by a Statistical / Analysis approach?

Many cybersecurity detections begin as simple queries or rules that originate from either known incidents or expert knowledge. These simple rules or heuristics can be great! They are usually informed by evidence and often tailored to a specific company’s environment. These detections can range from string matches for known indicators of attack (IOAs) to statistical approaches that analyze whether a given event or set of events falls into a typical distribution for the endpoint or environment.

Think statistics/analysis is sufficient? Think again

Despite their advantages there are significant drawbacks to these basic analytical approaches. Static rules and heuristics risk becoming stale. When a query is written against published indicators of attack (IOAs), for example, an attacker need only change the file name or hash slightly to evade detection. Distributions of events and commands may drift over time, with the number of alerts slowly creeping up as a result. And, many IT admins can likely relate to the ever-increasing length of allowlists some detections need to function as a company grows and diversifies.

Even user and entity behavior analytics (UEBA) is necessarily limited by the size of the entity pool from which the analysis is drawn. Behaviors that are new to a given user or environment may be perfectly benign, but get flagged due to their novelty alone. (While some UEBA products are based on data science methods like anomaly detection, many are built on basic statistics, and the latter are susceptible to false positives each time a user acquires new software or learns a new skill.)

A better approach: Anomaly Detection

Anomaly detection can complement a statistical or rules-based approach to mitigate certain drawbacks. In anomaly detection, the machine learning (ML) algorithm looks for outliers - in other words, anything that looks “weird.” Humans (and cybersecurity professionals especially) are natural anomaly detectors. Think of a time you’ve sifted through a bunch of data looking for that “needle in a haystack,” without thinking about the exact parameters of what you are looking for. We can’t expect humans to sift through the quantity of alerts generated in modern environments. Thankfully, anomaly detection algorithms work in much the same way, learning what is normal for an environment without ever being given strict boundaries.

Our anomaly detection algorithms use characteristics similar to what a cybersecurity expert would look at, or even to what a statistical approach might use, but with a complexity that would be challenging to write into a statistical heuristic and an ability to process far more events than a human. In addition, our anomaly detection models can go beyond what is normal vs. weird in a specific user environment to analyze trends across businesses similar to yours - we’ll come back to an example of why that’s useful in a minute.

Let’s look at some concrete examples. In the following scenarios, we’ll look at detections involving PowerShell commands. (For more information on PowerShell and other scripting that can be used maliciously, check out our Threat Insight.) In these examples, I will contrast the use of traditional rule- or heuristic-based approaches to detecting specific malicious PowerShell techniques with an anomaly detection approach.

Example 1: Avoiding a False Negative from Obfuscation

Malicious actors often try to trick rule-based detections by throwing in obfuscating characters - for example, a command line that begins “-w 1 dow`nlo`ad(bad.exe)” might trick a simple string match looking for the term “download.” A common countermeasure would be to preprocess the command line, removing unusual characters or punctuation and thereby increasing the likelihood of correctly matching specific words against the detection rules. The problem is that this approach risks throwing the baby out with the bathwater. In this case, the command is anomalous precisely because it has those extraneous characters in the middle of a word. Additionally, many script obfuscators will use elements like environment variables that are difficult to process safely. An anomaly detection model can pick up on the strange way the word is split, the presence of unusual characters, and the presence of the word “download” simultaneously, greatly increasing the probability of a detection.
Example 2: Avoiding False Positives from Benign Processes

On the flip side, because many legitimate PowerShell scripts are functionally so similar to malicious PowerShell, they can often cause false positives on heuristic-based detection systems. Take this powershell command, common on machines running Visual Studio software:

“-NoProfile -InputFormat None -ExecutionPolicy Bypass -Command [Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12; iex ((New-Object System.Net.WebClient).DownloadString('https://awebsite/install.ps1')); software1 upgrade -y python visualstudio2019-workload-vctools; Read-Host 'Type ENTER to exit'”.

It has multiple parameters that are often seen in attacks (ExecutionPolicy Bypass, WebClient, Download), and so could run afoul of rules-based queries that pick up on combinations of these indicators. If your company uses this software often, this file might be allowlisted. But if it’s an uncommon package in your environment or the name of the installation file recently changed, a heuristic detection might lead to false positives. An anomaly detection model trained across your full environment and many others like yours is much more likely to have seen this before and correctly classify it as normal behavior.

While it is theoretically possible to correctly classify both of these scenarios using heuristics and statistics, achieving the desired mix of accuracy and precision would require an ever-more-complex set of rules - one that reacted to each false positive or false negative. Layering rules like that generally results in low numbers of false positives, but also leads to missing real threats. In contrast, anomaly detection is both flexible and powerful, combining knowledge of your unique environment with other data to detect attacks without surfacing false positives.

For more information on how we’re applying machine learning to cybersecurity, check out this podcast featuring our Head of Data Science, Alexis Yelton. Or, to see our ML-driven detections in action, request a demo of our Managed Detection and Response service.

Topic: Cybersecurity Industry

Related Resources

Cybersecurity Industry

2025 Cybersecurity Predictions: From Ransomware Shifts to AI-Driven SOCs

As 2025 approaches, the cybersecurity landscape is evolving rapidly. Adam Winston, Chief Security Officer of ActZero, shares key predictions and takeaways for the year ahead. From ransomware tactics to advancements in AI-powered SOCs, these trends outline the challenges and opportunities

Cybersecurity Industry Threat Intelligence Ransomware Data Protection

Ransomware: The Persistent Cyber Threat and How to Combat It

Ransomware attacks are a growing concern, especially for schools and small businesses. The latest CrowdStrike 2024 Global Threat Report reveals a staggering 75% year-over-year increase in cloud environment intrusions, showing that ransomware isn’t just sticking around — it's evolving. Schools and

Cybersecurity Industry Cybersecurity News

Top Cyber Threats in 2024: What Schools and SMBs Need to Know

Cyber threats are becoming more advanced, and schools and small businesses are now top targets. With limited resources and weaker defenses, they’re often viewed as easy prey for cybercriminals. This guide outlines the major cyber threats of 2024 and offers practical tips for building a stronger

Cybersecurity Industry Data Protection Education

Ransomware Prevention Guide for Schools and Small Businesses

Ransomware continues to be a growing threat, particularly for schools and small businesses with limited IT resources. This guide outlines essential steps to help your organization safeguard against these attacks.

Cybersecurity Industry Data Protection Education

Cyber Smarts For Schools & SMBs: Your Cheat Sheet to Staying Safe Online

Your Guide to Staying Safe Online— Without All the Jargon. Here are 10 cybersecurity tips every employee should know to keep your organization safe from cyber villains.

Security Hygiene Cybersecurity Industry

Is Your Cloud Secure? Avoid Costly Misconfigurations with Best Practices

As cloud usage grows, so do the risks. Recent reports show a 75% year-over-year increase in cloud intrusions, with cloud-conscious attacks—where hackers specifically target cloud vulnerabilities—rising by 110%. Even more alarming, 84% of these intrusions are focused on eCrime, meaning attackers are

Cybersecurity Industry Cybersecurity News Education

How to Apply for the FCC Cybersecurity Pilot Program: A Quick Guide

Cybersecurity is a growing concern for schools and libraries. To help address this, the FCC launched a $200 million Cybersecurity Pilot Program, providing funds for advanced security solutions, such as managed detection and response (MDR). Here's how to navigate the application process.

Cybersecurity Industry Cybersecurity News Data Protection

Top 5 Abuses of AI

With AI investments soaring, evidenced by the AI 100 raising $28 billion in venture capital deals and Nvidia becoming the world's most valuable company in June, we are undoubtedly in the summer of AI. The rapid growth of AI companies, along with 700,000 open-source models and 150,000

Cybersecurity Industry Education

13 Facts About the FCC’s New Cybersecurity Pilot Program

On June 6, 2024, the Federal Communications Commission (FCC) announced the Schools and Libraries Cybersecurity Pilot Program, a three-year, $200 million initiative to help the FCC assess the best ways to fund and protect educational institutions from cyber attacks. ActZero applauds the Commission

Cybersecurity Industry Data Protection Education

Harnessing AI to Combat Cyber Threats to Protect Student Data

As technology integrates deeper into the curriculum, the responsibility of protecting sensitive data and ensuring a secure learning environment rests heavily on the shoulders of tech directors. In an era where attackers are employing sophisticated AI tools to enhance their tactics, school district

Curious about how ActZero can evolve your cybersecurity strategy?

Get a demo

Cybersecurity Industry

How Data Science Can Save You From a Heuristics Headache

Back to Resources

Share

Back to Resources

Share

Related Resources

2025 Cybersecurity Predictions: From Ransomware Shifts to AI-Driven SOCs

Ransomware: The Persistent Cyber Threat and How to Combat It

Top Cyber Threats in 2024: What Schools and SMBs Need to Know

Ransomware Prevention Guide for Schools and Small Businesses

Cyber Smarts For Schools & SMBs: Your Cheat Sheet to Staying Safe Online

Is Your Cloud Secure? Avoid Costly Misconfigurations with Best Practices

How to Apply for the FCC Cybersecurity Pilot Program: A Quick Guide

Top 5 Abuses of AI

13 Facts About the FCC’s New Cybersecurity Pilot Program

Harnessing AI to Combat Cyber Threats to Protect Student Data

Curious about how ActZero can evolve your cybersecurity strategy?