MVP: Account database scheme suggestion

MartijnWeterings · 3 December 2019 22:59

Is there no risk of relying on third-party authentication (e.g. when they change arrangements or when certain countries get blocked)? (although, whenever the third party blows up then this problem might be solved/recovered if we keep account recovery email accounts)

Recently the political stability of the internet and the businesses involved in it are a bigger concern to me than the need to change standards for hashed passwords (but this is me being paranoid about governments and bigger than they should be corporations).

Maybe this is a too much a novice look at it, but when it is just about ‘appropriately’ hashing then why can’t we simply remove the simpler hashed versions when ‘appropriately’ changes?

(and use a backup extremer encrypted version for verification whenever the simpler version got removed?)

Or is this about people having passwords like 123456, which no hashing can protect against? So it is about us needing to protect the database of stored salts and hashed passwords, no matter how good they are encrypted (because the passwords might be bad)?

cellio · 3 December 2019 23:05

I am very much a novice when it comes to credentials/authentication stuff, but would we address the concerns of (a) risk of being in the DB and (b) risk of relying on third-party providers by having a separate identity-management piece that we host ourselves? Essentially, we would treat it as a third-party provider, except we’d be the third party.

I guess I’m talking here about the instance that we plan to provide. The Codidact platform would support third-party credentialing, and we would also offer a system of our own for doing that. Any given instance can use ours, skip that and use Google, or do something else.

(If we go this route, I’d still suggest using somebody else’s system for MVP, and then adding our own later to address concerns about geo-blocking/flakiness/whatever.)

gilles · 3 December 2019 23:10

Speaking as a security professional, I recommend that we do store passwords.

Depending exclusively on a third-party service is a risk. What if they go out of business? What if they change their conditions in a way that becomes unacceptable to us?

Depending on a third-party service is an acceptance barrier. Not everyone may be willing or able to use that service. I’m not going to create a Facebook account. Some people live in countries where Google is blocked.

By all means offer the option to log in via a third-party service, but there should be a way to authenticate solely through the site.

Passwords need serious security scrutiny, but they’re not that hard to handle. Take a robust cryptography library that includes password hashing, and the code to add on top is easy to audit. Easier than the code to implement third-party authentication.

manassehkatz · 3 December 2019 23:28

(As I’m writing this @gilles reply showed up - which agrees with much of what I am writing.)

There are a lot of pieces to balance here. With systems for my customers, any part requiring login/authentication is typically a private system (i.e., employees or my customer’s customers - but a limited group not “anyone can sign up”), with everyone involved always (or almost always) accessing it in the US) and in general fewer problems to worry about. I have learned a lot about security over the years but I don’t consider myself an expert. That being said:

I really like the idea of not relying on a 3rd-party system. I don’t mind having it as an option, but personally I find it a bit creepy to use a Google (for example - and I like Google) account to get into a non-Google system. So almost always I skip that kind of thing - i.e., if I go to a system and it offers “Sign in with Google, Facebook, etc. or create an account”, I will create an account. If it doesn’t clearly offer a “create an account” option and I don’t really need the system, I go away. Seriously. I often find such things by way of Google News or just searching for stuff - it becomes a real barrier to me and I simply stay away. I don’t want that to happen to Codidact.

In addition, there are the very real concerns of blocked systems. We can’t guarantee a Codidact won’t be blocked, but if it gets blocked for authentication then it gets blocked for reading/posting and vice versa, and we see what we can do to mitigate the problem. But if it is a 3rd-party system we have no control.

That being said, some people like this way of using the internet, so offering it (by “it” I mean “login by piggybacking on Facebook or some other system, however that works (which I’ve never totally understood)” makes sense if it is easy to do. The other alternative - a 3rd-party system but exclusive to us (i.e., only for logging in to Codidact but not for other system) doesn’t make much sense to me for a number of reasons - it adds the “3rd party system that could disappear or be blocked, etc.”, it means we need to have emails in two places (i.e., it doesn’t avoid the issue (not a big deal to me, but it is to some) of storing email addresses since we still need them for notifications and the 3rd party service needs them as well), and it makes the Codidact system as a whole impossible for someone to copy and implement without contracting with a 3rd party as well.

As far as the fear of passwords. It really isn’t that hard to do reasonably correct. Passwords need to be salted & hashed, as I understand it when done correctly that alone mitigates a ton of problems by making it extremely hard for someone who steals the database to actually figure out the passwords. (If someone uses 123456, the password can be figured out by pure brute force methods - no database access needed.) Making the authentication system a separate internal part of our site can somewhat mitigate any risks, though I caution that hashing has to be done at the right place or you actually can (technically) increase risk - though if everything is on the same physical server it really isn’t that big a deal. Plus we aren’t talking classified national security here. So reasonable security measures are needed but not extreme measures. We can also increase security optionally via two-factor security or other methods, but keep in mind that anything that makes it harder for new users will effectively stop some new users, and we don’t want to do that.

MartijnWeterings · 3 December 2019 23:33

I always thought that those brute force methods where employed on a list of hashes from an already hacked database. The site should protect against brute force guessing of passwords not (e.g. temporarily blocking access after some false attempts)?

manassehkatz · 3 December 2019 23:42

There are two types of brute force attacks:

Sit at a terminal (or have your robot process pretend to sit at a terminal) and try various common passwords. This is mitigated by (a) having reasonable password requirements (e.g., mix of character types, minimum length, check for common strings) and (b) limiting frequency (e.g., no more than once every 10 seconds) & number of attempts allowed (e.g., max. 5 attempts before blocking for 1 hour).
Dictionary attack. Take a compromised database of hashes and try “everything” until you find matches. Once you do that, you have the basic key and access to one account. As I understand it, salting properly (even if the salt is stored with the hash, which I find pretty amazing) makes it so that compromising one account does not automatically mean the bad guys can get into all the other accounts.

gilles · 4 December 2019 00:07

Not quite. Please browse https://security.stackexchange.com/tags/passwords, in particular How to securely hash passwords? and What are rainbow tables and how are they used?

That’s an online attack.

No, that’s counterproductive. See XKCD #936: Short complex password, or long dictionary passphrase?

Yes, that’s how you block online attacks. For a website, beware not to overly block, otherwise an attacker can easily cause a denial of service by making failed login upon failed login.

That’s an offline attack.

Not quite. The point of salting is to arrange that when the attacker “tries everything”, that’s only everything for the one account that has this particular salt value, and they have to do completely new computations for the next account which has a different salt.

ArtOfCode · 4 December 2019 00:44

While I’m not a security professional, I handle a few services at scale that have to store both user authentication data and individual authentication keys for external services (i.e. data that must not be disclosed), so I have a fair bit of experience in how to handle this.

I have to agree with @gilles. Storing passwords correctly requires some experience, but it’s not a difficult thing to do. We should be doing this ourselves; the risk of doing it incorrectly (especially if we’re using widely-used, agreed-secure authentication libraries, which we should) is far outweighed by the risk of relying on a third party to authenticate.

WRT Google/Facebook/etc login, that’s a different matter. While there’s usually some authentication data to store for these logins, there’s no password to store, and the methods for its storage are usually different. The protocol by which these logins work is designed to help make them secure. We should also offer these, but that’s more for user convenience than because it’s somehow more secure.

HappyHacker · 4 December 2019 01:07

I’m also not a security professional, but I have a lot of security experience. And I’ve seen countless, countless homegrown systems that have serious security flaws.

To clarify: I’m not suggesting only supporting social or external logins. Auth0 is a service that provides username/password logins, with social/external as an option. Auth0 would manage our auth database. With full could scalability. With a dedicated security team. For free.

There is a risk: if Auth0 goes out of business, our users will need to reset their passwords. That’s the only risk. Compared to the risk of our database being compromised (which is almost inevitable), I’d much rather take the risk of Auth0.

manassehkatz · 4 December 2019 01:09

@gilles First of all, just to reiterate:

so I expect others (e.g., you) you get this more correct than me. I tend to speak more in general terms on this type of thing.

Actually, I just saw that recently. And you are correct. Really the end result of that goes largely (in terms of bits of entropy) to “minimum length” - a short string with “stuff” in it is marginally better than a short string of lower case alpha. But a long string of lower case alpha is better than any short string. etc. And of course, aside from “getting it right”, crazy password requirements end up frustrating the real users, which is never a good thing.

Agreed.

I think we agree on this one, just in different language (yours is clearer).

raphaelschmitz · 4 December 2019 09:01

I don’t remember where I read it (I think it was on the security stack), but that info is outdated.

Because the attackers have also read this and now it’s out of fashion to try “aaaa”, then “aaab”, then “aaac” and so on.

Now what they do is start with a huge dictionary of words. “Password”, then “Banana”, then “BananaBattery” etc.

alerque · 4 December 2019 12:06

No, it is not outdated. The security benefits discussed rely on math, not on the obscurity of the method used. Any scheme that relies on hackers being less up on the latest fashions than users is doomed anyway.

The point is not whether somebody trying to brute force a hash starts with trying “aaa→aab” or “apple apple apple→apple apple banana”, but how big the key space is. Even with every symbol on an ASCII keyboard the character set is not very large, so you have to build entropy by making it a bit longer. While using words means the raw character length no longer represents as much entropy as using random letters, then number of words in a dictionary means entropy accrues much more rapidly and the end result is easier for humans to remember. Double win. Easier for humans, harder for computers. The math is against the hackers and the language is more favorable to the user.

raphaelschmitz · 4 December 2019 16:48

No, it is not outdated.

Everything you say is correct for your perspective as a user.
However, in this thread we were talking about how to implement identity as developers.

Manassehkatz mentioned the usual password requirements and was told those were “counterproductive”. So the 2 options to evaluate here are

A)
A sort of “standard” form, like 8 char min length, min one upper case and lower case letter, min one number, min one special character.

B)
Some kind of experimental form trying to make people use XKCD style passwords. There wasn’t really any elaboration about how, so I can only guess:

Definitely without those character restrictions.
Possibly with some kind of info text to explain the style.
Let’s not fool ourselves, a lot of people will ignore this.
Possibly with a huge minimum character length
e.g. “correct horse battery staple” is already 25 chars. To quote your link here, “Security at the expense of usability comes at the expense of security.”

Apart from the usability issues with that, here’s a summary of how the math can quickly become worse for this.

Don’t think about how secure you make your password.
Think about how secure our technology will make all user’s passwords.

GrumpyCrouton · 4 December 2019 17:06

The double-edge sword here comes in from what you said;

Let’s not fool ourselves, a lot of people will ignore this.

We can focus all we want on password policy, we can display it to our users what makes a good password, but I wouldn’t say there is a very good way to force this type of password to be used. My point is that we should do as much as we can to protect the passwords from hackers or database leaks, but at the end of the day the user who chooses a weak password on purpose can’t really be helped much. That said, our system should still make it difficult for online-attacks to be done, and we should of course store everything in a way that is not easily reversible to plaintext.

manassehkatz · 4 December 2019 17:22

Actually, my favorite hosting company (ICDSoft), recently added a password “guide” of sorts when adding email addresses. I am sure it is not the only one out there, just using as an example. It works something like this:

Password field starts out blank, with “password strength = N/A”
“Generate” button next to password field. Click it and you have choices of:
- Random = “any” characters mixed together, default length = 12
- Pronounceable = random lower case alpha, default length = 15
- Passphrase = random lower case alpha words, default length = 5 words (i.e., this is the XKCD thing!)
You can regenerate if you don’t like what you get. You can adjust the length to make it stronger or weaker.
Presumably (but I have not analyzed) the 3 password types with default lengths are of similar entropy/security level.
When you select a password or type one in by yourself, you get a password strength (Very Weak, Weak, Fair, Strong, Very Strong) - obviously the Generate button (with default length at least) always comes up with Very Strong.
Weak or Very Weak are NOT accepted - i.e., you can’t add an email account unless the password meets a minimum of “Fair”

I can ask ICDSoft about the algorithm/library used. There is a good chance they will tell me - they are incredibly helpful - and if it is based on open source software with a reasonable license then I think this is worth looking into. Even if not, this basic methodology seems like a good way to force the user to do “the right thing” while giving them the option of gibberish-but-shorter vs pronounceable-but-longer vs. passphrase (XKCD style - longer to type but easier to remember). And if people want to do their own thing with their own password generator, they can do that too - as long as it meets some minimums (based on entropy, not based on arbitrary “1 of this and 1 of that and min. length”).

GrumpyCrouton · 4 December 2019 17:50

I know in the past I’ve seen a lot of articles claiming to not trust password strength meters, because some of them are very misleading (And as a user, you never know which password strength meter library you are using, so no way to guage it’s usefulness). I can’t find any more recent articles about this so perhaps this sentiment has changed in the last few years. I’d say if we go this route, we need to really stress test it to make sure it’s accurate.

ArtOfCode · 4 December 2019 18:28

What @alerque is saying is correct. If information about security makes your system less secure, you’re relying on security by obscurity, and you have no security at all.

The accepted methods of password security rely on mathematically-proven, computationally-difficult algorithms. Disclosing what’s used or how they work does not make them less secure.

raphaelschmitz · 4 December 2019 18:46

I never said he was wrong

However, I don’t know why already the second person mentions security by obscurity to me. Counting on attackers not knowing about XKCD style passwords is what I argued against.

I also don’t understand what you’re trying to say with your second paragraph - how we hash things doesn’t really influence password requirements, does it? It all works on bytes nowadays anyway.

gilles · 4 December 2019 21:47

The point of XKCD-style passwords is precisely that there is a mathematically proven minimal strength, i.e. even an attacker who knows the exact method can’t break them in a realistic time, unlike L33t-style passwords.

raphaelschmitz · 4 December 2019 22:24

I feel like you’re thinking of e.g. 4 words with 25 characters total as if they were 25 random characters. Let me just quote from the article I linked earlier:

The average vocabulary of an English speaker might be about 15,000 words. If you choose a 4 word password then you would have about 15,000 choices for each one.
The total password space is therefore 15000^4 = 5 x 10^16 combinations. In all honesty it is better to choose from common and easy to remember words and this is the advice that XKCD gives.
For example, no one is going to choose ecumenical-virology-assuage-malefactor as their password, it’s going to be something more along the lines of apple-truck-ladder-cat.
So this will reduce that word space, possibly down to something like 2000 or so for a typical person.
The actual total password space is likely reduced down to something like 2000^4 = 1.6 x 10^13
combinations.

There are 26 lowercase letters, 26 uppercase letters, 10 numbers and about 32 symbols. This all adds up to 94 characters from which to choose your password.
The total password space is therefore 948 = 6.1 x 10^15 combinations.
This password space is almost 400 times larger than the XKCD password space.
If you are paranoid and you crank your passwords up to 10 characters that password space is now a very large 5.4 x 10^19 passwords. This is 3.3 million times larger than the XKCD password space.

(Emphasis Mine)