Securely storing sensitive information

Storing sensitive information isn't something new, it is a problem that was solved a long time ago.

Now, I'm not talking about sensitive information such as passwords, in which case you store only the hashes of the passwords.

No, I'm talking about information like credit card numbers, social security numbers, medical information, etc. Basically any data that you need for your operation but don't want the random hacker to put his hands on it.

The reason it is regarded so lightly by developers is because the solution is rather easy. Encrypt the data, and you're done!

Well not so fast...

First, lets recap on how encryption works (if you want to deep dive into cryptography, I recommend taking the Coursera Crypto 1 Course by Dan Boneh).

At it's basis encryption consists of two functions Enc and Dec.

Enc receives a plaintext (PT) and a key and outputs the encrypted ciphertext (CT).

Enc(PT, key) = CT

Dec does the opposite. It receives a ciphertext and a key and outputs the plaintext it originated from.

Dec(CT, key) = PT

OK, nice!

But wait, once I encrypt the sensitive data where do I store the key?

And with that question I started a journey deep down the rabbit hole. Understanding that although the solution for storing the information itself is easy, there is no consensus on what to do with the collateral such as keys.

So, I started searching for a solution, speaking with security and IT specialists, searching the web, reading papers and finally managed to find a solution that answered my concerns.

In this article I would like to share with you the concepts and approaches for solving this problem, and show you how to build a solution that will minimize the threat of your sensitive information being compromised.

* Throughout this article I refer to a single key out of simplicity, of course there might be numerous keys that are used to encrypt the data, according to different segregation parameters such as customer, geo region, etc.

Approach #0 - Use Java Keystore

Java provides a very straightforward way for storing keys and certificates called Java Keystore.

Keystores are usually password protected and stored on the machine running your application.

Although, this is a very easy way to store the key, it has several disadvantages:

In order to read from the Keystore you need a password, this means your systems needs to hold it somewhere. You'll be amazed how insecurely such passwords are stored. The worst thing I've seen was a code that reads it from a file that is stored as plain text on disk or even worse hard coded in the code, making it available practically to anyone. Keeping the password secured is not an easy task and require special care, we'll talked about this later.
When you run a cloud service you usually don't have a single machine. You have tens and sometimes hundreds of machines. This solution forces you to propagate the Keystore across the cluster. In practice it is not scalable and enforces limitations on the system, that will become a pain in the a*** when you grow or apply architectural changes.

Approach #1 - Generate the key based on a user secret

Instead of storing the key, the system can generate it based on some secret that is known only to the user. For example, use the user's password as the secret and generate the key by applying a hash function on the password (coupled with some salt).

This way the key is never stored in the system, and the data for each user is encrypted with it's own key.

Lets talk about the cons:

The encryption and decryption is always user initiated, without the user your system will hit a dead end. Remember, that of the scenarios are not user initiated and are triggered due to different schedulers.
What happens to the encrypted data when the user needs to change his password? In this case you'll need to be able to re-encrypt the entire data with the new key, and this is very expensive.
You put your trust in the user. User passwords are the first thing that the attacker will try to get in order to get inside the system. So, this approach might not be as secure as you might think.

Approach #2 - Split and Stash

A significant downside of the previous approach was the fact that you needed a user in order to operate.

In this approach we store the key in a separate database. Even better, split the key into N parts key-1, key-2, ... key-n and store each part in it's own database.

This will reduce the attack surface of your system significantly. Now the attacker needs to get access to N+1 databases in order to get the information. And assuming that each database is protected correctly, with segregation of users and passwords (meaning that you don't have one user and one password for all databases), the task of stealing the data becomes a very hard one.

But, this approach has one significant flaw. The operational cost of this approach is just to high. To be honest no organization is insane enough to implement it.

Just think about it, instead of one database that stores the data you need to add N additional databases just for the keys!!! You need them to be highly available and fault tolerant, which makes your database cluster at least twice the size.

Moreover, in order to use the data your system needs to make N+1 network requests each time it needs to decrypt something, and this can takes a long time.

Approach #3 - Encrypt the key

Ok... if the key became "The" sensitive data, why not store it encrypted as well.

But here comes the chicken and the egg: in order to encrypt the key you need another key, a Key Encryption Key (KEK)!

Key Management System (KMS) to the rescue!

KMS is a server, service or a system that provides secure storage, access and management of the organizational keys.

There are a lot of KMSs out there. In this article I'll be talking specifically about AWS KMS, which is suitable both for cloud service and non SaaS systems.

What is AWS KMS

It's an AWS service, that provides key management capabilities based on Hardware Security Module (HSM) and is FIPS 140-2 complaint.

In human tongue this means that your keys are stored with the most security and compliance possible (remember there is no 100% security).

Some basic concepts of AWS KMS is creating a Master Key (CMK) which is generated and stored in AWS. This key is used in order to create Data Encryption Keys (DEK) that can be used by your system in order to encrypt the data.

Now, in order to understand how AWS KMS helps us solve our problem, we need to understand how two things work: (1) the DEK creation and usage flow and (2) how to get the credentials to authenticate against the service.

DEK creation and usage flow

AWS KMS provides a simple Web API that exposes all the functionality we need.

The API that allows you to generate a DEK is called GenerateDataKey.

In the request you specify the CMK Id which should be used to generate the KEK and encrypt the DEK. You also should specify the encryption parameters such as the encryption algorithm e.g. AES 256 or AES 128.

Here is an example of the request body:

Now comes the awesome part, which is the missing link that makes everything work...

AWS KMS will return the key in 2 formats:

Plaintext
Ciphertext

Here is an example of the result:

You can use the plaintext in order to encrypt your data. But, you should never store it. Your code should discard it from the memory as soon as possible, minimizing the chance of exposure (remember the no 100% security rule).

The encrypted format however, will be stored alongside the encrypted data.

Ok, so we have everything encrypted but what do we do when we need to decrypt the data.

The Dectypt API lets you send an encrypted data to AWS KMS and receive a plaintext back.

This means that you can use this API to send the encrypted key, and get the plaintext DEK in the response. After which you can freely decrypt the data you need.

The most important thing is that in the entire flow the key is never stored in the system. The KEK that is used to encrypt our DEK is stored in AWS KMS and never exposed to anyone.

This means that even in case of a breach in your system, the attacker won't be able to get the key to decrypt your data.

Authentication credentials

We talked a lot about the how to get the keys, but the authentication against the API is another aspect that can serve as an attack surface, similarly to the disadvantage of storing the key in a Keystore (approach #0).

The fact that you need to provide a credentials forces your system to store them somewhere, and in theory, exposes them to a potential theft.

If you are running on AWS, the infrastructure provides you with a way to reduce the possibility of credential theft by allowing the instance to get a temporary credentials directly from the instance metadata once it's starts up.

You can create a IAM role and and assign it to the EC2 instance, your system can read it from the IAM instance security metadata and use it to authenticate against the API.

Those temporary credentials are guarantied to be recognized by the AWS KMS each time the machines reboots and gets new temporary credentials.

For details you can check the AWS documentation on how to retrieve security credentials from instance metadata.

In fact this eliminates the need to store credentials altogether, making it impossible for the attacker to gain access to the credentials used for the KMS API.

Conclusion

Storing sensitive data might seem an easy and well solved task, but it involves a lot of details around handling the keys and credentials that might expose the data to a theft.

This becomes even harder when a cloud service architecture is involved.

Remember, there is no 100% security and the solution proposed here might be, in theory, compromised as well.

However, using KMS reduces the threat to almost none. Making it almost impossible for an attacker to gain access to the sensitive information stored in your database.

Although in this article I described using AWS as service provider and AWS KMS as a solution, this is also possible to solve using other cloud provider services such as Microsoft Azure and their Key Vault service.

But that is a story for another time...

Securely storing sensitive information

コメント

RECENT POST

Reincarnation of the Gantt Chart

Infrastructure: to build, or not to build?

The end of QA as we know it

The notion of False Time Estimation Syndrome

Securely storing sensitive information