Sharing sensitive data is made possible when critical data can be sanitized, anonymized or removed. Alacer’s Data Cloak transforms sensitive data into a shareable public file and a private file. The private file stays with the data owner and can be re-integrated with the public file after analysis.
The data security paradox
Data gains value when it’s available to the right audience. The greater the audience, the greater the potential for unlocking value in the data. However, it is often the most sensitive data that is also the most valuable. Customer records, transaction histories, and medical information are a rich source for data mining. These datasets are also closely guarded, and their confidentiality might be legally protected or their access restricted by corporate policy. The reasons are clear: a data break can be disastrous for a company and its customers. This is the data security paradox: the more valuable the data, the harder it is to obtain the data.
WHY IT MATTERS
First and foremost, the data security paradox affects the owners of the data. A bank department might want to analyze transaction data in a public cloud, like Azure or AWS, but are restricted from doing so. Or the bank department might want an outside analyst to look for money laundering activities, but are afraid the data might be leaked to the Internet.
Second, the data have value to an audience outside the data owner’s organization and its trusted partners. Prevailing wisdom is that data can not be safely shared with the public. “Bad actors” could use the data maliciously or inappropriately. However, this audience also contains researchers, non-profits, NGOs, think tanks, and altruistically minded individuals can use the data to discover new insights, advance everyone’s understanding of a subject, and create new value. A McKinsey Global Institute reports that “open data can help unlock $3 trillion to $5 trillion in economic value annually…”
SAFELY SHARING DATA
Organizations share data, even sensitive data. Smart organizations lower their risk by anonymizing or obfuscating sensitive data, especially personally identifiable information (PII). PII includes identifying information like name, phone number, or date and place of birth. PII extends to anything that can identify who the person is, including medical health history or IP address. Getting data in a state where it is ready to share means removing PII and anything that might be damaging in the context of the data. This is sometimes called “sanitizing” the data.
A common approach to sanitizing is to remove all sensitive attributes of the data before sharing it. This approach has two potential problems. First, the sensitive fields are often the primary keys of the data. Once these keys are gone, the ability to connect sanitized records with the original data is lost. This can be important. For example, if a third party sanitized bank transactions to find money laundering, the data owners will ultimately want to know which customers are implicated. If all PII is simply excised, this could be difficult or impossible to do.
The second problem is that sensitive data can be intrinsically valuable. Sometimes this loss is unavoidable. For example, removing a customer’s name from anti-money laundering data before sharing it would preclude an outside analyst researching that customer’s legal filings. However, sometimes the value of the information can be retained even if the data is sanitized. To continue with the banking example, the transfer of money between customers can reveal important patterns, even if the names of the customers are not known.
LIFTING THE BURDEN OF SAFELY SHARING DATA
The average analyst does not have the tools or experience to sanitize data. We have seen clients mitigate the problem by either never sharing data or sharing data after signing NDAs. Sometimes, the data are shared with nothing more than the hope that they are not compromised.
We set out to lift the burden of sanitizing data so they can be safely shared. We wanted to create an application that data analysts could use to remove sensitive information so that data could be shared with someone in another group at the same organization, an outside business partner, or even industry analysts and researchers. Important requirements for the solution included:
Spreadsheets: Most of the data our clients want to share is in Excel spreadsheets. Even if the data originates in a RDBMS, the data is typically exported as a flat file before being shared.
Accessible: Many data analysts work in secure environments where they cannot install applications on their computers. None of our potential users wanted to install an application and many would not trust an executable file even if they could install applications.
Simple: Complexity is a common source of security weaknesses. Simple systems are easy to analyze and their behavior is more predictable and less likely to contain bugs.
Isolated: The solution had to run entirely on the user’s computer with no need to access the network. Security becomes much more difficult once data is transmitted over a network.
Irreversible: Data must be sanitized in such a way that it was impossible to recover sensitive information after it was anonymized or obfuscated.
Retain relationships: Relationship mining is an important part of many analyses. The solution had to preserve relationships in the original data.
Two-way: In our use cases, sanitizing data to make it ready to be shared is only half the equation. Once the data is shared, another party typically analyzes it and enhances it by adding additional data attributes. For example, an outside analyst might work with shared customer data to create a risk score for each customer. The analyst then provides that information back to the data owner. It should be easy for the data owner to join the enhanced data with the original, sensitive data.
The Alacer Data Cloak solution To meet our requirements, we created a browser-based application that transforms a file with sensitive data into a public file and a private file. The shareable public file contains sanitized information. It is safe to share with outside parties. The private file contains the sensitive information and stays with the data owner. The private file can be used to re-integrate (uncloak) analysis provided by an outside party.
Brower-based application: The Data Cloak runs entirely in a browser. This makes the application very accessible; there is no need to install or start an executable. Browsers are installed on almost every workstation. Browsers also allow the application to run transparently on different operating systems (e.g. Windows, Mac, Linux).
Comma-separated value files: The Data Cloak operates on comma separated value (CSV) files. All spreadsheet applications can save their output as CSV files. Spreadsheet programs can also import CSV files easily. CSV files are not specific to an operating system, browser or other technology.
User interaction: The idea behind Data Cloak is simple. The Data Cloak scans an input file and presents the user with a simple list of choices for each attribute in the data. The user easily selects one of three actions for each attribute:
- Declare the attribute is safe to remain in the shared data.
- Declare the attribute as sensitive and should be cloaked.
- Declare the attribute as sensitive and remove from the shared data.
Local processing: No data, not even sanitized data, are transmitted to another computer. The output is generated and saved to the user’s local storage files. The sanitized public file can then be safely emailed to the intended recipient.
Random generation: Sanitized values are randomly generated. The sensitive data is not used as part of the sanitizing process. There is no way to divine the original data from their sanitized replacement values.
Global substitution: Sensitive values are always replaced by the same cloaked value to preserve relationships between entities.
Uncloak application: A second browser-based application, Alacer Data Uncloak, recombines the public and private files to uncloak or restore the data.
Start cloaking your sensitive data now
Alacer’s Data Cloak and Uncloak applications are simple, security tested and affordable. They streamline managing data files for any organization and provide a new standard of protection for sharing sensitive data.
The new Alacer Data Cloak
- Run off-line using existing browser software
- Secure sensitive data with reversible cloaking
- Powerful controls in easy-to-use interface
- Share data freely without risk