How machine learning can improve your company's cybersecurity

Machine learning is emerging as an essential technique for detecting cyberattacks. It is a constantly evolving tool that can turn the vast amount of data collected by companies into their best defense

Marc Torrens

Enterprise cybersecurity has become a vitally important issue due to the rise of digitization at all types of companies, regardless of industry or business model. Any organization that has digitized its operations depends on its information systems, making it imperative to guarantee their security. 

Machine learning is emerging as an essential technique for this task, according to Esade professor Marc Torrens in an article published in Harvard Deusto. Its usefulness lies in its ability to identify exceptional situations that may be digital attacks. In other words, it reveals contexts that are out of the ordinary and alerts the company so that it can respond appropriately.  

Machine-learning techniques can detect anomalous situations in the systems, analyzing potential vulnerabilities

The fuel for machine learning is data. How effective it is will depend on the quality, as opposed to the quantity, of the information received. Specifically in the context of cybersecurity, machine learning is fueled with the data obtained from interactions in an enterprise’s information systems, including applications, network sensors, computers, protocols, etc. These interactions may be external or internal and may involve any device connected to the organization’s network.  

Since the machine-learning system has been trained to recognize the usual interactions occurring within the company, it will also be able to detect unusual patterns, i.e., potential cyberattacks.  

“Based on all this information, machine-learning techniques learn to detect anomalous situations in the systems, analyzing potential vulnerabilities in companies’ information systems,” explained Torrens. 

Too much information  

Still, the real challenge in machine learning has less to do with the learning processes than with managing the data used to train it. This information can arrive at great speed and in huge quantities. The difficulty thus lies in suitably managing this enormous stream of data.  

According to Torrens, that is why “machine learning in cybersecurity has more to do with data engineering than data science.” 

The difficulty lies in suitably managing an enormous stream of data

Much of the data collected by enterprises is not exploited in any way. This is what is known as dark data. At many organizations, a lot of this information is stored for regulatory reasons but is never analyzed. Yet these data can be quite useful for machine learning to detect potential cyberattacks.  

At the same time, the cost of managing and storing large quantities of data has fallen considerably over the last ten years. The widespread implementation of cloud computing has become a great ally for enterprises to manage these dark data, as it allows them to outsource the complex and costly architecture required to an external provider.  

Learning from data  

In accordance with Torrens’s classification, there are two machine-learning approaches to train a system to detect cyberattacks:  

1. Supervised learning

In this case, it is necessary to classify the set of different situations into two categories: normal and anomalous. The resulting model is thus a classifier that detects which situations are potential system vulnerabilities. The difficulty here lies in building and properly defining the set of situations considered anomalous so that the algorithm can learn to classify correctly.  

The model would thus identify those anomalous situations that have a certain likelihood of vulnerability. However, there is logically a margin of error. The model can always be wrong, whether because it identifies situations as vulnerabilities that in reality are not (false positive) or because it allows situations of vulnerability to go undetected (false negative).  

2. Unsupervised learning

Unlike in the previous case, unsupervised learning does not require a set of labeled situations. With this model, the system simply identifies situations that are different from the majority of cases and, thus, could be due to system vulnerabilities.  

In any event, according to Torrens, ‘Whether applied to cybersecurity or any other field, machine learning is an experimental science that must constantly be revised.” This is mainly because the situations to be analyzed and potential cyberattacks are constantly changing and evolving. Therefore, the development of these practices requires constant data collection and updating of models. 

All written content is licensed under a Creative Commons Attribution 4.0 International license.