Topics in this article
AI – and machine learning in particular – has been touted as the solution to myriad security problems, from detecting advanced threats to easing the cyberskills crisis.
When a ransomware syndicate launches a million attacks, they count a success rate of just 0.001% (10 attacks) as a victory. Defenders, however, need to repel 100% of attacks or risk serious consequences for their organization. And that’s simply not something that humans alone can do.
AI is therefore seen as a way of cutting through the noise to help analysts in security operations centers focus on the incidents that truly threaten their organization’s security posture.
But is it working? Or is AI creating even more unnecessary noise? The answer lies in how the technology is implemented rather than the solution itself.
Just as the mere existence of a firewall doesn’t stop cyberattackers at your system perimeter, adding AI to a solution without considering your threat environment and tuning the algorithm accordingly will lead to an excess of false positives – or, even worse, too many false negatives.
For AI to be effective, it must be trained with a lot of data
Management school MIT Sloan describes AI and machine learning as different but closely related disciplines: although the terms are often used interchangeably, machine learning is “a subfield of AI that gives computers the ability to learn without explicitly being programmed”. So, machine learning is just one way of achieving AI, yet it is the dominant one today.
The key to successful AI is having a good fit between past data and a future event. Broadly, once an algorithm has been defined, it is fed large amounts of historical data to fine-tune its output. If this doesn’t work as expected, it adjusts the weightings of factors in its equations, then runs again with the data sets until it achieves a good fit. It continues to be updated with real data once it’s operational.
An algorithm can predict an outcome accurately only if it has enough definitive past data to infer a future event.
AI must balance false positives and false negatives
As Google’s manual on model training illustrates, the lessons from Aesop’s fable “The Boy Who Cried Wolf” are relevant to the growing use of AI in cybersecurity.
In the story, a bored young shepherd shouts, “Wolf! Wolf!” The villagers come rushing out to help him, only to discover there is no wolf. He does this so many times that when a wolf really does show up, his cries for help are ignored – and the flock of sheep is in real danger.
If the algorithm asks, “Is there a wolf among the flock?”, we may get these answers:
- True positive: A wolf threatened the flock, the shepherd cried, “Wolf!”, and the shepherd is a hero.
- False positive: There was no wolf, the shepherd cried, “Wolf!”, and the villagers are angry.
- False negative: A wolf threatened the flock, the shepherd said, “No wolf!”, and the wolf ate all the sheep.
- True negative: There was no wolf, the shepherd said, “No wolf!”, and everyone is fine.
We want our algorithm to return only true positives and true negatives. In security terms, a true positive is when an attack is correctly identified as such. A false positive occurs when the AI solution classifies event data as an attack when there isn’t one.
If it excludes too many false positives (we ignore the boy crying wolf), it will also include too many false negatives (we may miss the actual wolf because the algorithm fails to expose it to the analyst).
The reason is that cybersecurity data is rarely definitive. The point at which something can categorically be classified as a cyberincident (a true positive) isn’t always conclusive.
Attackers may mask their behavior with noise or pretend to be conducting legitimate business. And once you’re certain something is an attack, it may already be too late to stop it. On the flip side, the earlier we alert an analyst to a possible incident, the more ambiguous the underlying data can be.
Hence, AI algorithms must be able to accept that the answer to the question “Is this a security breach?” is often “Maybe” rather than a simple yes or no – and this is important. If we ignore every “Maybe”, an otherwise preventable attack may well break through our cyberdefenses.
5 ways to operationalize AI
The future of cybersecurity will include a healthy dose of AI. But we should not see AI as the only solution. Instead, we need to take a balanced, threat-led approach to implementing it in our environments.
Here are five key areas to focus on:
- Infuse data and AI skills into your security operations center(not just the IT or project team) so that you can keep your algorithms close to your operational knowledge.
- Use AI only where it makes sense.Machine learning won’t fit every use case, so be selective rather than trying to apply it universally. Just because you can, doesn’t mean you should.
- Treat AI models like you treat your staff and business processes.Focus on using open models, not “black box” algorithms that can’t be interpreted, analyzed, tuned and retrained before, during and after cyberincidents. AI needs to learn and adapt based on both your mistakes and your successes.
- Pay attention to the threat, not the noise.When you’re training algorithms, the focus should be on getting the basics right first: do your threat modeling and understand what you’re trying to defend your organization against. Then, train your models to maximize precision and recall against these threats, not to minimize noise or false positives. In other words, focus on the wolf, not the boy. You should be concerned if your algorithm is returning no noise at all, as this means it’s probably missing legitimate attempted cyberattacks.
- Exercise and update your threat models and AI algorithms regularly as part of your security operations center procedures.Just as your staff require regular training, so do your algorithms.
AI has already changed the cyberindustry for the better, and its potential is almost limitless. By focusing on the right areas (the proverbial wolf), we can build trust in cyberdefenders, accelerate our AI-fueled transformation to machine speed – and keep the wolves at bay.
Dirk Hodgson is Director: Security Practice at NTT