Over the past two decades, small and medium-sized businesses (SMB) have been flooded with complex, and difficult to process data— ‘big-data’ as it is famously known, moving at terminal speed and with no signs of slowing down.
How will your organization persevere in transforming convoluted data into meaningful and actionable insights that support your business needs?
Thankfully, advances in parallel-processing, distributing computing, and machine learning (ML) and deep learning (DL) algorithms have resulted in step-function increases in computational complexity. In other words, artificial intelligence (AI) and ML-driven technology have made it so that we can process veracious data and arrive at verified conclusions in record time.
A central component of cybersecurity is data protection. So why not bring data to the forefront? ActZero is a data-first company firmly rooted in proving the hypothesis that “security is a data problem”. In this post we will discuss the necessity for, and advantages of, data science in cybersecurity.
What is cybersecurity data science?
Cybersecurity data science is the method of using algorithms to extract valuable insights from a dossier of structured and unstructured data. The extracted information is then used to detect, prevent, and eliminate cyber threats.
Can data science help eliminate false positive alerts?
The simple answer is, yes.
At the core of data science is analytics, and the alliance of AI, ML and data science is not only a sweet spot for evolving your security, but a strategic move in fortifying your defenses. By learning your environment, AI and ML integrated data science can accurately spot patterns in data sets, identify anomalies, uncover what has been missed by the human eye, offer classifications, and even give predictive analysis. With a high enough signal-to-noise ratio, you reduce the false-positive alerts generated by heuristic solutions; you can even automate responses based on such insight.
Read our whitepaper: The ‘Hyperscale SOC’ and the minds behind it for an inside look into the first-hand experiences of the industry professionals who collaboratively built a new kind of cybersecurity company on the foundation of state-of-the-art data science. It promises valuable insights that will help your organization avoid the pitfalls of modernizing your security operations.
How does cognitive bias affect cybersecurity outcomes?
In the hands of a subject matter expert (SME), data sets are more likely to be analyzed with cognitive biases, i.e., subjective reality—expert knowledge, while a generalist researcher is more likely to allow the data to tell the story, i.e., objective.
Why are we biased?
Glad you asked. Cognitive biases are the result of our brains’ attempt to simplify and quicken decision making. Our judgments and decisions are the sum of our past experiences, lessons-learned, knowledge-based and habits. These biases generally fall into two pockets:
Memory-related—I didn’t like the outcome, so I remember the events leading up to it in an unfavorable light.
Attention-related—I didn’t want the outcome, so I focus more on data that supports my point of view.
We have all encountered situations where our beliefs led to snap-judgments. If left unchecked, our subject matter expertise becomes even more entrenched in our own beliefs. For example, how likely are you to buy a new version of an anti-virus with 5-star ratings if you have already had a less than stellar experience with previous versions preventing a ransomware attack?
If we guessed right, your answer is, not very likely.
This concept is no different from a SME performing an assessment on data sets through the lens of their expert knowledge.
How to minimize bias in cybersecurity outcomes
Cognitive biases are a fact of life. However, there are best practices that you can implement when building a data-first organization to mitigate them.
-
‘X’ is a data problem
First, start by completely adopting the hypothesis that “X is a data problem”. This is a standard we uphold at ActZero, regularly testing any event under our purview in the context of this hypothesis i.e., can we prove the event exists through data? If yes, show it to us.
We do not ignore that there could be possible scenarios where “X is not a data problem”. In fact, we embrace it because it provides our SMEs a safe environment to fail, and constructively confront their biases. Afterall, isn’t it a win for our business if the SME can prove that X is not a data problem? Regardless, because we use top-down support, we purposefully introduce a slight bias into the hypothesis, incentivizing the organization toward finding supporting data first.
-
Have a collaborative partnership
Secondly, place the data team alongside the SMEs, and not underneath. This partnership structure ensures that the data team is not unduly influenced into considering or discounting specific solutions. Past attempts, and failures should only be inputs i.e., lessons learned, not the directive.
The SMEs optimize the tools available to them and can request justification for any change they are asked to make. While the data team delivers new tools and solutions and can request whatever data is germane to their efforts.
We recommend a clearly defined interface. Although both teams have joint ownership of the problem, the SME is teacher and operator, while the data team is student and solution owner.
-
Use a collaborative goal-setting methodology
Lastly, incorporate a goal-setting framework to promote unified team performance, monitor the pulse on engagement around measurable goals, and have a proactive assessment of impact. This is an example using Objective and Key Results (OKRs) goal setting.
OBJECTIVE: To improve SME work quality/speed through data
KEY RESULTS for data team:
- Spend X learning sessions with SMEs
- Introduce N improvements to SME tools
- Reduce an SMEs average signal-to-noise from X to Y
- Reduce an SME’s event handling time from X to Y
- Achieve an X out of 10 user feedback score from SMEs
KEY RESULTS for SMEs:
- Conduct X mentor sessions with data team members
- Provide data-proof substantiating each of X event-types
- Conduct root-cause-analysis for each unsubstantiated major event
- Adopt N new tool improvements within X time
- Provide user feedback on Y tool improvements
Lastly, codify this relationship in goals. This will help with tracking progress and outcomes.
The importance of team selection cannot be overstated. Aim for complementary teams with a penchant for collaboration. SMEs who are unable to explain their responsibilities and challenges, are not team players, and intolerant of change will cause major friction. Data teams who are not quick studies and empathetic will produce ineffective solutions and erode trust.
A practical example of modern data science in cybersecurity
At ActZero, our data team comprises three classic roles: data scientists, ML engineers, and data engineers. The data scientists model behavior and reduce signal-to-noise. ML engineers bring these models to production and scale them widely. Data engineers feed the models (and the team) with rightly formatted data at low latency. Although individually the best at their craft, they are not security experts.
Our SMEs bridge two disciplines. The SOC employees protect customers, implement tools, and verify efficacy, while the security researchers generate data on new or emergent threats.
Security experts are unaccustomed to answering questions about their intuitions, but we encourage our data team to query the runbooks for enhanced understanding. To mitigate friction, we incorporate structured interactions, and use brainstorming techniques to encourage ongoing collaboration, problem solving and ideas generation.
Through the qualitative observation of senior security experts, the data team modeled their expertise, resulting in remarkable first iteration models that aggregated to a refined software with better results than the security team could produce on their own.
Conclusion:
Highly tuned, trusted, and productive data teams are rare. Rallying around a shared hypothesis of “X is a data problem” is one way to align your organization and achieve the desired return on your data science investment. Wielding this hypothesis as the driver of collective effort can dispel cognitive bias and produce innovation where no one thought possible.
The genius of ML-driven data science in cybersecurity is not in the elimination of the human touch, but in the enhancement of it. We want to help you connect the dots and bridge your security gaps. To survey your current security solutions and identify improvements in your technology, tools and procedures schedule your Ransomware Readiness Assessment today.