It can take months or even years before a data breach is detected. The latest statistics from Ponemon Institute’s 2018 Cost of Data Breach Report outlines that it takes an average of 197 days to identify a breach. That means someone is in your network, on your systems, in your applications for over six months before they’re detected, IF they’re detected. That’s six months! On the higher end of the same report, there are companies that have been breached for years before they realize it. For example, sources indicate the Marriott data breach occurred back in 2014, but it was not disclosed until 2018. The scale of that breach is still being evaluated and it seems to get bigger and more impactful as more information is discovered.
Why is this the case? Most companies have all the hardware and data they need to detect a breach. All of the attackers’ activities are in the audit files, log files and other sources of data required by security frameworks and compliance policies to gather and store. Yet it still takes six months to detect. It all comes down to the data.
In 2019, data is being generated at a volume and velocity never seen before. Some have estimated that we will generate more data this year alone than in the past 5,000 years combined. To put this into perspective, on a daily basis:
- 500 million tweets are sent
- 294 billion emails are sent
- 4 petabytes of data are created on Facebook
- 4 terabytes of data are created from each connected car
- 65 billion messages are sent on WhatsApp
- 5 billion searches are made
Now factor in all of the network devices, wearables, IoT devices and sensors, cameras, drones, blockchain and everything else, it is easy to see how much new data is being generated. Now pair that with your organization and its infrastructure, applications and people, and it all starts to come together.
Hackers know this. They use this extreme volume of data as cover. If they can be patient and hide their malicious activity in the immense volume of data to make it appear as innocuous user behavior, chances are their activity will go undetected for long stretches of time. It is becoming harder and harder for humans to search through all of this data to find a needle in a haystack while more hay is being piled on.
In 2018, the average cost of a breach was nearly $4 million US dollars. Like an undiagnosed illness, the longer a data breach remains undetected, the more damage it can cause. For larger organizations, that number can be in the tens of millions or larger. For a small business, a data breach could be the difference between staying in business and having to close its doors.
Unfortunately, small businesses are often unaware they’ve been compromised until another party informs them. For example, this can happen if a financial institution discovers a sudden rise in cardholder fraud and traces the source back to a single merchant. More than 70 percent of attacks target small businesses, and an estimated 60 percent of those that experience an attack will likely go under within six months. Businesses that operate in the retail, food and beverage and hospitality industries are the most susceptible to a compromise. One third of small businesses do not have the right tools or resources in place to protect against a breach. Again, attackers know this and use it to their advantage. Small businesses are low hanging fruit, that is why we have seen such a sudden spike in ransomware attacks. The necessary protections are missing, making it far too easy to get in, get the money, and get out. Attackers ask for a ransom that is just large enough to make it worth their time, and damaging enough to make any business owner give security a serious thought.
Beware of the Insider – Everyone Has a Price
How are these breaches occurring?
The same Ponemon study found that a majority of the attacks are from external cyber criminals, but interestingly enough, the next largest percentage was through malicious insiders. These are employees who have responsibilities within an organization who are misusing their trust or access. It’s important to be aware of this type of threat because they are typically very difficult to detect and often take a long time to discover.
The malicious insider threat is hard to detect because we typically trust our employees. If an employee is working with sensitive data as part of their job, it’s very difficult to determine if they are doing anything malicious with the data. Even if you suspect malicious intent, it’s easy for employees to claim that they simply “made a mistake” and get away with it. It is almost impossible to prove guilt in these cases. It’s pretty easy for tech-savvy employees to cover their tracks.
What Motivates the Malicious Insider?
- Being aware of certain factors and indicators can help determine if you have a malicious insider threatening your organization.
- Is data being accessed, copied or deleted when there is no business justification?
- Is data being transferred out of the organization through file uploads, email and/or physically on media?
- Are changes to access occurring for file locations or inside of business applications that have no business justification?
- Are deactivated or terminated employee accounts being activated?
- Are unauthorized areas being accessed?
Every one of these activities by themselves could be benign. This is where we need context to help paint a broader picture. Malicious insiders are often driven by greed, anger, lack of recognition, ideology, ego, financial need, compulsive behavior and much more. This is context.
Look at all these different factors, characteristics and behaviors, and then see if a pattern emerges. It could be that they are just ambitious. Context will help you recognize the difference.
What is Context?
Let’s strip away the technology aspect and focus on context itself. Context is any information about any entity that can be used to reduce the amount of reasoning required for decision making. Ultimately context transforms data into hopefully useful information. Context is something we use in every part of our lives, particularly in language. For example, every word in this article is data, and you’re using context provided in this article to transform language into information about context.
In terms of language, there are two types of context: cultural context and situational context. Cultural context consists of attributes such as personal backgrounds and experiences people have gone through, their roots and heritage, the history of themselves, their family, their culture, etc.
Situational context takes into account who is involved in the conversation, the background of the participants and what they bring, the theme of the conversation and where the conversation is taking place. The combination of these types of context markers is what is referred to as universal discourse.
Using another example of context that shows up in our everyday lives, we’re all familiar with those ads that show up on our Facebook feeds after we searched for something on Amazon or Google. It feels like I can simply think of an item and an ad shows up in my Facebook feed. Frightening! How do they do this and target me directly? This is done using years and years of science and data going back to the Mad Men era of advertising. Data is still key, but machines are compiling and contextualizing that data faster than ever before.
Think about it – if you don’t know anything about your audience, it’s extremely complicated if not downright impossible to market to them. The importance of data can’t be overstated.
Big Data is a term that’s been thrown around a lot in the past few years and while AI and Machine Learning are opening up even more avenues for data gathering, the value of data can only be measured by its usefulness. That’s why “intelligent data” is a better way of looking at it.
Using Facebook as an example, CEO MarkZuckerberg knows how old I am, where I live, where I travel, who my family and friends are and my hobbies. At this point he probably knows more about me than I do. All of this information can be used to filter out things that are irrelevant to me, Steve, to focus advertising things that I would find interesting and get me to act, or purchase. The more personal the experience can become, the more inclined I will be to engage with the ad.
Where the Rubber Meets the Road
Context or contextual information is evidence about an entity that can be used to effectively reduce the amount of reasoning required for decision making. This can be done via filtering, aggregation, inference or other like methods. Contextualization excludes irrelevant data from consideration and has the potential to reduce data from several aspects including volume, velocity and the impacts of Big Data.
Generating alerts by simply correlating log data from devices without some sort of validation usually results in an overwhelming flood of alerts. Because the alerts were generated without validation, security teams must manually validate the alerts themselves. Given the number of events your security team has to deal with, you probably can’t afford to spend more than a few minutes deciding if an alert represents a true threat to your organization. This massive backlog of alerts leads to stressed-out security teams. Additionally, many of the alerts are likely to be false positives, as SIEMs are known to flag alerts based on what we call indicators of compromise(IOC), which need further investigation. This approach, in turn, leads to increased mean times to identify, hence the six months it takes to identify a breach.
What are the types of context that need to be considered.
Internal Context: Contextual information is about internal systems, such as the system’s business function, importance, location and what data or assets it houses. Context about internal systems helps an analyst understand if the observed attack is even relevant to the target system as well as help prioritize the incident. For example, is this potential attack against a production server or is it a visitor on the guest network?
External Context: Given that only an IP address is included in the event, external context can help attribute who owns the IP address and its geolocation. Reputable threat intelligence is helpful in understanding more about the attacker, the attacker’s intent and if other organizations have been targeted.
Behavioral Analysis: Historical patterns of the behavior and associations of systems and account help corroborate if the observed activity is malicious or just normal behavior. Incidents unfold over time, involve multiple data sources, and adversaries attempt to ‘live off the land,” meaning they will attempt to hide within authorized administrative tools.
We need to tackle these modern security problems with modern solutions. Based on the volume and velocity of data and the gap in technology, it is too easy for cyber attackers and malicious insiders to hide their activity within normal user behavior. SIEMs are a necessary first step to helping address the challenge, but need correlation technology to piece the data together. This has still left us with a 6+ month mean time to identify. This is where we need to apply other sources of data, or context, to transform our data into actionable information. We use context in nearly every part of our lives and businesses. Applying context to security should not be any different. This will help separate the actionable data from the noise.
Contextualized data improves our decision-making process and reduces operational costs by increasing the efficiency of our resources, both of which will reduce our mean time to identify a breach. This allows us to respond much faster and keep the impact and costs associated with a breach down.