Storing PII securely

gilles · 4 November 2019 22:59

From day 1 we’ll have to store personally identifying information:

User identifiers such as email addresses, openids or other authentication tokens, etc.
User names and content of user profiles
Connection logs
Not exactly PII but related: passwords (if we use passwords)

How do we store all of this securely? Note that it’s not just about public relations, it’s also about legal requirements (e.g. GDPR). Where (in which jurisdiction) is it stored? Who has access?

manassehkatz · 5 November 2019 01:26

Passwords:
Salt & hash with industry-standard algorithms. That’s the easy part because passwords only need to be verified, not actually read.
User names and content of user profiles:
Not sure what the concern is here, as that information will, by and large, be publicly accessible.
Email addresses, phone #s (if used as backup verification method as many sites do), security questions (if we use them), etc. and any other PII:
Need to work on this. Should be encrypted, but since the data needs to actually be unencrypted and used, proper programming methods need to be used to minimize the exposure of data.
Connection Logs
In my experience, connection logs (logs in general) are typically not encrypted but are stored as plain text. The most important thing is to store them in a location where they can’t be accessed directly - i.e., out of the web server path.

GDPR may be a big issue. As I understand it (and I really don’t understand it very well), GDPR can be an issue for any web server hosted in the affected area (which is something arguably we should avoid) but also for any relevant data about any users from the affected area, which is something we can’t easily avoid (and don’t want to avoid - we want users from around the world). Anyone with relevant experience with GDPR and similar issues?

ArtOfCode · 5 November 2019 18:46

I have some experience with GDPR. It’s not as onerous as folks think it is.

We don’t have to store PII encrypted. We can if we choose to, but it’s not a requirement. We simply have to ensure only authorized people have access to it, which can be achieved simply by controlling access to the deployment server.

As for access - keep it limited. The sysadmin for the deployment server will have access to things, of course. It’d probably be wise to have one or two developers with access, too. Anyone who has access, until we are a legal entity, must be listed with contact details in a public privacy statement. See here for an example.

luap42 · 5 November 2019 18:50

This is also under (broader) consideration in this post:

HeapUnderflow · 8 December 2019 21:35

I have experience with COPPA. For MVP, a requirement of “don’t ask age, and have a way to quickly delete PII if requested by a parent.” is sufficient, although this is not the ironclad get-out-of-COPPA-free card that many think it is.

The COPPA regulations are currently in the very beginning stages of being revised, and will probably be different in a year or two.

Marc.2377 · 12 December 2019 15:52

At first, by employing tested-and-true implementation techniques, such as mature authentication/authorization frameworks and separating personal data from non-sensitive data in a manner that allows for more rigid security control, at all levels - datastore, caching, application in-memory-data, API methods and network.
Later, by requiring careful audit from experienced members of our project (in terms of information security), and actually enforcing the rigid security control mentioned above.

That is yet to be defined (I think)…

That too, at least speaking in precise terms - but, preliminarly, of course as few people as necessary: Maybe one sysadmin, one DB administrator… and one or two trusted developers, although these shouldn’t actually need to have real PII (as dummy or anonymized data should probably serve well for all purposes I can think of right now), and if they do, it should be on a case-by-case basis.

In terms of the software, whether moderators or site/instance admins will have access to these via the platform is something that hasn’t been discussed yet AFAIK (or has it?).
If it depends on me, I’d say no. Not until there’s a demonstrated need for some specific PII access anyway, at which point it could be implemented if the merits are deemed valid.