These days, encrypting credit card data is standard practice. But what about other personal information? You may be thinking, "What's the point in encrypting personal information that's already public?" Well, this is certainly true, but why give an unsavory individual any more information than you need to?
Picture this: Some unknown individual has gained access to some of the information from your database, perhaps using some sort of hacking method. Now let's set aside the stereotypical image of some nerd with wide open eyes furiously typing as a flurry of green ones and zeroes fly across the screen, since this isn't actually how hacking works. Anyways, somebody has gained access to some of the data on your site. You can be reasonably sure your customers' credit card info is still safe, as this is encrypted. Your hacker might now turn to your customers' personal information instead.
At this point, the severity of this situation depends on the kind of information you're storing. Identity theft is probably the first concern if you're storing information like social insurance numbers or driver's license numbers. How a hacker could make use of your data depends largely on the information you are storing. Still, even if you're just storing the basics such as name, address, email etc., there may be reasons to keep this data encrypted:
- A hacker with a some sort of vendetta could spam your customers' email address
- He/She could post their information
- Pull juvenile pranks like ordering pizzas to people's houses
- And don't forget the general humiliation having peoples' info leaked. People usually don't like this
There was a actually a fairly recent, high profile incident involving a website called Ashley Madison, which basically exists to provide a means for married people to cheat on their spouses. Now, please let it be known that neither I nor Acro Media in anyway endorse this site or condone its use. In case you are not familiar, security on this site was recently compromised, leading to the personal data of millions of its users being posted online for public consumption. Setting aside the moral implications of using this site, as well as that of exposing its users, one has to wonder if the company who owns this site may have wished they had done more to protect their information from the public eye.
The solution to this may seem obvious: Just encrypt your customers' personal information as you did with their credit cards. But don't forget that there's usually a downside to adding more security.
Of course, this presents a problem when you try to search for a customer by the name of John Doe when his now encrypted name looks something like this:
while his first name John might simply be just as complicated and long:
Clearly, searching for John when you want to find information about John Doe isn't going to get you anywhere. You may think you can simply decrypt all of the data you'll be searching through beforehand. Unfortunately this is not a practical solution, because you're going to have to decrypt the information for all of your users before your search. In addition to the regular search time, this will cause massive overhead.
And this brings us to the highly-fascinating-if-you-are-a-geek-like-me topic of searching on encrypted data. Like any problem, there are a few ways of dealing with this. And like most problems relating to security, there's a tradeoff between efficiency (i.e. runtime) and security. Here are just a few ways of dealing with this:
Use the same encryption in your search
There's a fair bit that goes into doing this, involving a database and possibly a hash table, but basically, instead of searching for the text itself, your system runs the search term through the cipher, and then searches the data for the encrypted version. So when you do a search, you type whatever you want to search for, i.e. "John Doe". Your system then searches for the previously mentioned jumble of nonsense for a match, then returns the results if there are any found. This is likely the simplest solution, but has a major drawback. Namely, you can only return results that are an exact match to your search term. So if the entire name of your user is stored in a single field, for example, won't be able to find our friend John Doe by searching for "John", since "John" would translate into a jumble of numbers and letters that is different from the jumble that is "John".
Furthermore, the performance of your search will likely take a hit, since it's no longer as simple as it once was. This may not be a big hit, but it's still something to keep in mind. This may also not be entirely secure. Although you're hiding the information itself, certain things such as the number of times a particular cipertext appears can be seen. For more information on this, check out this article.
Use a more sophisticated solution
The topic of encryption is a rich one, with lots of nooks and crannies to be explored. As such, there are many groups trying to find ways to search encrypted data efficiently without compromising security.
Here's a paper from the University of Berkeley that describes one such solution, using plenty of mathematic expressions and big O notation:
If this all seems intimidating, that's probably because it kind of is. It's all well and good to talk about intense algorithms but it's another thing entirely to implement them. Keep in mind that this sort of work is done by academics that have the luxury of working in a sort of ideal world of theories where they don't have to worry about budget constraints or timelines. The bottom line is implementing these sorts of solutions can be expensive and complex. So while it may sound awesome to keep your data nicely encrypted, while still being able to access it quickly, consider whether it's worth the trouble to get it all working.
This brings us to the final solution
Only encrypt the data that is necessary
One of the reasons encrypting credit card data is so automatically assumed to be important is that there is rarely any need to search for it. How often is it that you say to your co-worker, "Hey, can you do me a favour and find the name of the customer whose credit card number is xxxx-xxxx-xxxx-xxxx?" Most likely you'll want to search by name, city etc. So encrypting the first and last name, which are possibly the most common search terms likely the most public might be silly. Encrypting a SIN (or SSN) would make sense though, since this is probably not a common search term, but is much more sensitive.
Naturally, the best solution may simply be not to encrypt publicly available personal information at all (unless you are Ashley Madison.) When you encrypt your customers' personal information, you're potentially spending a substantial amount of money on a solution that will slow down your site's search somewhat, all for the sake of protecting data that is already publicly available. Not to mention that encrypting data is already a secondary measure for the unlikely event that somebody breaks through your primary security measures.