Businesses are facing an influx of different data types which means companies need to make important decisions on how to use these insights in a sustainable and repeatable way.
Data is really only valuable if you can translate it into actionable insights and with more data available, companies have a lot to manage.
Yossi Naar, Chief Visionary Officer and Cofounder, Cybereason, said: “In security data analysis, hunting and AI-driven automated detection, the quality of your results depends heavily on the quality of your data. However, we often find ourselves having to process more data than we or our systems can handle.
“Depending on the organisation, there are essentially three approaches to managing and processing the data: sampling – analysing a statistically significant subset of the data; filtering – removing data that we deem unimportant or repetitive, and scaling up – finding tools and technologies that will allow us to process all data in an effective way.
“Sampling can be an effective way to learn about the statistical nature of the data, but it’s not very useful if you’re looking for a needle in a haystack or if you require access to specific data. Filtering is a good strategy when you have high certainty that your filtering methods are reliable, do not change the statistics of the data collected and can be guaranteed to retain all important data.”
He said that ‘smart’ filtering sounds better than it really is.
Naar added: “To help illustrate my point, let’s delve into the filtering approach to data collection – a favourite approach for several organisations and security vendors. Say that we limit the collection of network data to 100 connections for every process. On the surface, this sounds reasonable. The average number of connections per process is much lower on most endpoints, so you can expect to filter very little of the data.
“However, in reality, data patterns in computerised environments follow aggressive power law distribution and not linear or even natural distribution.
“As a consequence of this behaviour, any type of cap-based filtering will remove the vast majority of the data. Even if you try and factor in malicious behaviour into the filtering algorithm – what some vendors call ‘smart filtering’, there are still several issues.”
He added: “As such, there is nothing about filtering that is smart. It’s not designed to reduce ‘noise’. It’s merely a strategy to overcome technological limitations of the server-side systems and save on the cost of the solution. However, this comes at a significant cost to the integrity of the data, the quality of detection and the security value provided by the system. When you apply arbitrary/smart /statistical filtering, you will inevitably introduce blindness to your system. And hackers will exploit it – either deliberately by understanding how you made your decision or by accident – because you can never have 100% certainty on what particular piece of data can be completely ignored.”