Data Protection: Encryption vs. Masking

Data privacy regulations are in vogue recently, with new ones popping up everywhere. The General Data Privacy Regulation (GDPR) in the EU has probably made the most headlines, but several US states and other countries have also passed data privacy regulations recently. As a result, organizations are, more than ever, compelled to take steps to protect the sensitive data in their care from compromise.

The number of data breaches that occurred recently have demonstrated that organizations are often not able to do so. A cyber defender at an organization has to locate and fix every vulnerability in an organization’s cyber defenses, while an attacker may only need to find one to be successful. A sufficiently motivated and patient attacker is almost certain to find a way inside.

As a result, organizations need to take steps to ensure that a cyber incident doesn’t necessarily mean a data breach. This can be achieved by ensuring that data is protected as much as possible. Two potential solutions for this are data encryption and data masking.

Encryption and Masking

Data encryption and data masking can both be considered types of obfuscation or hiding the true value of sensitive data. The main differences between them is their properties and how they can be applied.

Data encryption involves performing a reversible operation upon data that makes it impossible to extract the original data without knowledge of some secret (the encryption key).

There are a few different types of encryption (symmetric and asymmetric) that mainly differ in how many keys are used and how they’re applied. In symmetric key encryption, the same key is used for both encryption and decryption, meaning that it needs to be shared in advance over a secure channel. In asymmetric encryption (also called public key encryption), there is a pair of public and private keys. The public key is used for encryption and the private key is used for decryption. The desired end result of encryption is a ciphertext that looks as random as possible.

Data masking, on the other hand, is designed to produce an output that looks realistic but isn’t the real data. For example, masking a phone number would produce another phone number that can’t be easily linked to the original phone number. With masking, it’s not necessary to be able to reverse the operation at the other end, since the data is being sent to an untrusted environment anyway.

Pros and Cons

Encryption and masking are both ways to protect sensitive information from unintentional disclosure. In both cases, an unauthorized user who gains access to the obfuscated data does not have the ability to reverse the obfuscation and retrieve the original data. If done properly, obfuscated or masked data that is revealed during a data breach reveals nothing about the original, sensitive data and is not a reportable breach.

Obfuscation and masking both have their pros and cons. The main differentiators between them are the reversibility of the process, its security, and the usability of the obfuscated data.

Reversibility of the obfuscation operation can be a major advantage of encryption and a major disadvantage of masking. Encryption is commonly used to protect data at rest or in transit where the intended recipient is expected to be able to deobfuscate and retrieve the original data. If the data is encrypted (and the recipient has the appropriate key), this is possible. With data that has been masked to a random (but realistic) value, this isn’t possible without access to the lookup table (which wouldn’t be available to the recipient).

A disadvantage of the reversibility aspect of encryption is that it has a potential impact on security. In order to use the encrypted data, the recipient needs to have the key that can decrypt it. If this key is not properly protected, then an attacker may be capable of gaining access to it. If so, this defeats the entire purpose of the encryption process. With masking, the masking operation isn’t designed to be reversible by the recipient, so the attacker doesn’t have what they need to reverse it either.

Finally, masking has an advantage over encryption with regard to the usability of the obfuscated data. If the data is intended for use by an untrusted recipient (like for testing software in a development environment), the obfuscated data needs to be realistic in order to be useful. Encrypted data is essentially a random number, but a data masking operation can be designed to ensure that the obfuscated data still meets the needs of the end user.

Choosing the Right One

Data encryption and data masking are both extremely useful methods of protecting data, and the right choice largely depends on the situation and intended use case. The primary considerations are how the data is intended to be used by the intended recipient, if obfuscation needs to be reversible, and whether this recipient is considered “trusted”.

The classic use cases for data encryption are for protecting data in transit and data at rest. Anytime you use HTTPS for web browsing or use full-disk encryption to protect your files against theft of your laptop, you’re using encryption. In these cases, you trust the recipient (whether the webserver or yourself) and intend for them to have full access to the data after deobfuscation (your Word documents aren’t much use if they’ve been obfuscated to just “look right”).

Data masking is a good choice when dealing with unforeseen situations or untrusted applications like software in the test environment. Placing a data masking solution between your crown jewel database and performing masking on any requests that haven’t been explicitly authorized may be a good idea. That way, any legitimate use of your data gets the real thing while a hacker exploiting a vulnerability gets something that looks right but is useless.

In general, most organizations should be deploying both data encryption and masking solutions but aren’t doing enough of either. Taking the time to lock down access to your sensitive data to only authorized users may be what saves you from a costly data breach.