Corporations spend billions of dollars annually on compliance audits and remediations, dollars that one can argue are misused. Whenever a new data privacy standard is released, entire teams are spun up to bring systems into compliance. But you know what? The folks on these teams actually have stopped creating value for their company and their shareholders and instead are focused on things that do not create value! At best, this is playing defense – but it’s not scoring!
The job of the IT organization in any company, regardless of size, is to create systems that support the business and add value to them. No business user ever asked their IT partner to build them a compliance system. Instead, they ask for a system that will enable the business to do its job better. They want something that will enable them to better serve their customers, to help identify new business, or to support the inner workings of the company (such as HR or Finance). When they make the request, they assume that the system being built for them will protect their data from misuse and therefore be compliant.
If IT organizations built systems that had security embedded in them from day one, where all of the sensitive data was protected via some type of method that prevented a bad actor from stealing it, they would not have to constantly be working on remediation when a new standard is released. This is how organizations put points on the board!
You can secure data utilizing any number of well-established and proven protection methods. These methods are very well understood and commercially available, including encryption, masking, and tokenization. Each of them has strengths and weaknesses that you must consider when designing a system. To be blunt, not understanding these potentially leads to problems and frustrations after implementation.
Encryption is very good at rendering your data unreadable while it is encrypted, but you cannot run ad-hoc queries or do any sort of data analysis using Machine Learning (ML) or Artificial Intelligence (AI). Those analysis techniques require very clean data to draw correlations, and encryption messes with that. Additionally, encryption requires an expensive ecosystem to support it and requires administrators to do a significant amount of labor to ensure that things are always protected. Anyone with a large amount of data understands the pain of key rotation. When a data element is encrypted, the size of that element is increased which usually means massive database and application changes. Implementing encryption on an existing system is pretty much like performing open-heart surgery—it will be successful but probably pretty painful.
Masking works for lower lifecycle system tasks (development, testing, training) because it takes a copy of the live data and obfuscates the sensitive parts by replacing characters with masking characters. You’ve no doubt received a sales receipt that shows your credit card number with a string of “X” characters followed by the last 4 digits of the card. A well-designed masking system is fine for you to use while developing the system, but it does nothing to protect data once it has been promoted to a production environment. Most SQL engines have the ability to return masked data to a user based on their security profile, but it does nothing to protect the data if a bad actor has gained administrator privileges. If a sophisticated user attacks the database, you will definitely lose the data.
Tokenization replaces individual bytes of sensitive data with other, random bytes, all while retaining the original format of the data. Tokens and the original data elements have a 1:1 relationship, so the same token will always be generated for a given input. Because tokenized data looks exactly like the original data, no database or applications changes are needed. Additionally, users can still query the data or use data-analysis techniques (ML, AI) against it with the same results. The original tokenization systems used a large database that contained tokens and an encrypted value along with a hash to help identify whether a value had already been tokenized. These systems worked just fine for many years, but they have, at the heart of them, encryption which requires all of the standard encryption maintenance. Gen-2 tokenization systems were built to handle a very limited dataset or data fields such as credit card numbers or social security numbers. They used a token vault to perform their operations, but they certainly did not scale or handle other types of data. Gen-3 was created to support the transformation to machine intelligence platforms and data engineering strategies, with all of the agility/scale that they demand. They support tokenization of any length and type (numbers, and strings).
All professional sports teams have specialists who perform specific tasks; for example, in American football, you would probably not ask the placekicker to line up under center as the quarterback, or a linebacker to attempt a field goal. Yes, they are all football players, but they each have a specific role on their team which is rarely interchangeable. The same is true for the above techniques used to protect data—they all have their uses, but in very specific contexts and circumstances to solve specific problems.
When designing a system, the engineers should plan for but not necessarily implement protecting all the data the system works with, even if at the outset certain data isn’t considered sensitive. There is no harm in tokenizing a customer’s address because it will not require additional space to store it and can still be queried; if in the future a new regulation is enacted, the system will be compliant without any work. Gen-3 tokenization systems are very fast, too, so the few microseconds required to convert a token to clear text will not be noticed when the original value is required.
Only a few vendors have Gen-3 tokenization systems. The solution that seems to be gaining market share at the moment is SecureDPS from comforte (https://www.comforte.com/). SecureDPS is a platform that includes capabilities to discover where sensitive data is stored, to map data lineage (who touches it, who uses it), and to protect data (via tokenization, format-preserving encryption, or masking). Better yet, it is developed for cloud-native applications, so it protects your data no matter where that information is located, on-premise, in the cloud, or even in containers.
As you do your due diligence research, keep these desired outcomes in mind:
- Make sure your company’s efforts are focused on your core business, not achieving compliance mandates.
- Compliance should be something that your supporting data protection solution does for you, allowing your engineers to focus on providing business value.
- Your IT team should research data-centric solutions that protect the data thru its lifecycle (data acquiring, data operations, data intelligence).
- Have your IT team examine the pros and cons of each data protection method – encryption, masking, and tokenization. Compared side by side, you will no doubt notice that tokenization entails much less administration while providing much stronger data protection.
- Compliance and data security should enable your business, not block it!
Isn’t it time that you started scoring points instead of trying to keep up with the onslaught of new rules and regulations?