Although vendor-written, this contributed piece does not promote a product or service and has been edited and approved by Executive Networks Media editors.
The most frustrating fact in InfoSec is that attack information is there in the data, but today’s systems are not capable of getting to that data in time and, as a result, they miss attacks and generate a lot of false positives.
Hiring more analysts isn’t the answer because of the costs involved and the difficulty in finding the right talent. The unemployment rate for InfoSec professionals is essentially zero. In fact, Cisco puts the worldwide shortage of InfoSec professionals at 1 million.
The trick is to emulate the human analyst, since we know humans are best at judging if something is an attack or not, and emulating a human is fundamentally the domain of Artificial Intelligence.
To mimic a human, a machine needs to learn from a human
There are a lot of machine learning technologies in InfoSec, but the key questions are: Are they mimicking the analyst? Do they learn from the analyst, and do they predict what an analyst would say when presented with a new behavior? If the answer to these questions is no, then these solutions are part of the problem and not the solution.
A system that mimics a human can be thought of as a system that generates an army of virtual analysts. Armies need leaders to direct them and train them. This is the role of human analysts, and it is a crucial role. Working together, the human analyst and army of virtual analysts cover more ground across your entire enterprise and can detect both new and emerging attacks.
To achieve artificial intelligence that can mimic an analyst, we have to combine the computer’s ability to find complex patterns on a massive scale with the analyst’s intuition to distinguish between malicious and benign patterns. This symbiotic relationship helps machines learn from humans when the machines make mistakes, and helps humans see complex patterns across extended time periods.
The challenge of a thin label space
The reason InfoSec, unlike computer vision, has failed to capitalize on AI is because of a lack of training data, also know as labeled data. In other words, there is a ton of data lying around that hasn’t been organized into behaviors, and then labeled as either malicious or benign by an infoSec analyst. It’s what data scientists call a thin label space. Absent labeled data, an AI system cannot learn.
But come to think about it, analysts, who are continuously judging whether behaviors they monitor and investigate are malicious or benign, are generating labels. The problem is, these labels are not being captured today. We need to create a system that continuously captures those labels and then uses them to train new predictive models that can emulate the judgment of an analyst in real time. The predictions from these models are shown to the analyst and the feedback (label) is captured again. At each iteration of this process, the number of labeled examples available to train the system increases and, as a result, model accuracy improves.
Sign up for Computerworld eNewsletters.