Some Best Practices for Web App Authentication

It’s rare that a significant amount of time will go by without me hearing about yet another leak of user credentials from some well-known site. In the interest of incrementally increasing the security of the web as a whole, here’s a checklist to consider when writing your next (or current!) web application.

The Basics

These are things that effectively mandatory – if you’re not already doing them, you should probably start doing them as soon as you can.

Use SSL (https) for anything involving authentication.

Ideally, use SSL for everything; it’s the simplest way to make sure you’re using it in all the right places. If it’s not feasible to do that, however, you should at least be using it for anything related to authentication. This isn’t just limited to login pages – it also includes any pages that use your session cookie. If you protect the login request with SSL but then make requests to other pages (which send along the session cookie) over regular HTTP, your site can be attacked using what is known as session hijacking. In other words, any page that deals with a logged-in user probably should be served using SSL.

Use POST for anything submitting sensitive information (and in general, don’t write sensitive information into logs).

The URL of a request (where form variables submitted in a GET request) is typically written out to server logs; the contents of a POST body are not (unless you go out of your way to write them out somewhere). Storing hashed passwords in your database won’t buy you much if an attacker can just search through your logs and find the same information there.

On the same note, don’t write out any sensitive information to stdout or a log file. Avoid doing so even when debugging, just in case you happen to accidentally leave in such debugging code.

Hash your passwords with a strong, certified, and slow cryptographic one-way hashing scheme.

This means that you should be using something like bcrypt or a similar well-known hash function that has been designed to be slow. Don’t use MD5 or SHA-1, even though you’ll see a lot of existing code using them. Adding “HMAC” doesn’t magically make the hash slower, either – HMAC itself is not a hash function, and the most common implementations of it use fast hash functions (e.g. HMAC-MD5), not slow ones.

Why slow, you ask? Because a legitimate user won’t really mind if it takes a full second to process their login request – but someone trying to brute-force a password hash will have a much more difficult time if it takes a full second of CPU core time per password they try, as opposed to a millisecond.

It should hopefully go without saying, but don’t use 2-way encryption for passwords. There is absolutely no reason why you should ever need to recover the plaintext form of a password once it has been set – if a user forgets their password, you should use a separate process to reset it rather than giving it back to them. (See “Don’t use security questions” below for more on this topic.)

Ideally, design your system in such a way that you can easily change what hash algorithm you’re using – a common scheme is storing something like "algo|salt_string|hash" in the password field. That way, if you ever need to change the algorithm, you can simply require anyone with a stored value that starts with the old algorithm to reset their password, and store the new passwords in the same field (just using the new algorithm prefix). Note that bcrypt has this built in (the resulting strings it generates have metadata embedded, including a randomly-selected salt).

Salt your hashes, using a separate salt for each string you hash.

Note that this is completely separate from the previous point. Using salts doesn’t make your hash function slower, nor does it make it take any longer to crack an individual password. No, the purpose of salting is to make it so that the time it takes to crack a single password is not the same as the time it takes to crack every password in your database. The thing that salts protect you against is what is known as a rainbow table.

The fundamental idea behind rainbow tables is “if we’re going to try a trillion possible passwords, let’s run the hash function on each of them once, and then check all of the hashes from the database against the result – that way we only have to run the hash function 1 trillion times, rather than N trillion times where N is the number of hashes in the database.” Proper salting makes it so that a given rainbow table would only be applicable to a single user’s hash, so using such would be no more efficient than individually brute-forcing.

Store your sensitive data separate from your regular data.

This right here is an extremely low-hanging-fruit kind of thing. The vast majority of web applications I see have a “users” table in their database with fields like “id, username, password, name, email, join_date, favorite_color, …” and so on. This seems reasonable to most people at first glance; it’s all of the information specifically about a single user.

The problem is that it’s impossible to be perfect forever when it comes to security, and one day, a developer might write something like "SELECT * FROM users WHERE id = 5" and then dump the results directly into JSON to provide data to the AJAX call they just implemented on their new-and-improved profile page. Sure, they’re only using the join_date and favorite_color fields it returns, but anyone who happens to inspect the AJAX response now sees the password hash for the user nestled away there.

This can be really hard to catch in code review because there’s nothing in the new code that explicitly talks about sensitive data – the sensitive data just happened to be swept up along with the rest and dumped out into the (relatively speaking) public eye. If the sensitive data were in a separate table, it’d still be relatively easy to access along with other user data ("SELECT users.*, sensitive.* FROM users JOIN sensitive ON users.id = sensitive.id") but such accesses would be much more explicit and obvious to those writing and reviewing the code.

One might argue that the answer here is “never use SELECT *” but the simple response is that a lot of the time, developers aren’t even writing SQL – perhaps they dumped an ORM object to JSON, which faithfully dumped all of the fields. Putting sensitive data in a separate table is a simple solution to the problem that works no matter how you’re interfacing with the database. Of course, if you’re really concerned with user security you can go a step farther and put sensitive data like PII in a completely different database to allow you to better protect it and audit its usage – but that’s a separate topic.

Give your users the freedom to use whatever passwords they want, above minimum security thresholds.

It is reasonable (and a good idea) to require a minimum password length. If you don’t, some fraction of your users will inevitably choose “123” or similar. Likewise, it’s completely reasonable to disallow passwords that are nothing but a dictionary word plus a few characters. These are both the kinds of restrictions that encourage better passwords because they rule out the kinds of passwords that would be fundamentally insecure.

What’s unreasonable, though, are sites which place strict upper limits on passwords. The most notorious occurrence of this is probably “please choose a 4-digit PIN” – but it pops up in plenty of other less obvious forms. For instance, some sites place a limit on password length, e.g. “passwords must be 8-20 characters.” The minimum is fine, but why limit passwords to 20 characters at max? Assuming you’re hashing the password anyway (see the first item of this list), it’s not any harder to handle a 100 character password than it is to handle a 20 character one. The 29-character “correct horse battery staple” is far harder to randomly guess (assuming the person doing the guessing doesn’t read xkcd) than a 20-character password of a similar nature, but still very easy to remember.

The same also applies to restricting what characters can be used in passwords. It’s okay to require something other than a letter (in order to increase the number of potential characters most users will use in their passwords above just “the alphabet”), but arbitrarily disallowing things like spaces or non-alphanumeric characters is silly – if you’re hashing the password anyway, it won’t make a difference in how you handle passwords, and it will annoy the users who want to use a password that uses characters you don’t allow.

It’s okay to put sanity restrictions in place – for instance, a 1000-character maximum for a password field is reasonable to prevent someone from bogging down your password-handling algorithm by throwing a gigabyte of data at it (no one is going to type out a thousand characters by hand, and the small fraction of people using password managers probably aren’t going to be using quite that many characters either).

Put a reasonable upper limit on access attempts from a single location in a given time frame, and tell the user about failed access attempts.

In this context, “reasonable” is something on the order of 10-100 per 24 hour period. It’s high enough that no real user trying to remember what password they chose to use for your site is going to run into it, but low enough that an attacker trying to randomly guess passwords won’t make much progress.

Note that these kinds of limitations should be per location – you don’t want to make it possible for someone to lock someone else out of their account just by making a bunch of bogus login attempts. If a large number of different locations all try to access the same account, raise an alert and deal with the problem on a more specific basis.

Keep the user informed about attempts to access their account. This can be as simple as showing a “there were X failed login attempts since your last successful login” message when the user successfully signs in, but even better is to do this out of band – for instance, send the user an email after the 10th failed login attempt in a row. That way the user is made aware of the attack in a timely manner even if they aren’t frequently logging into your site. It also allows the user to figure out what might have happened if the attacker does eventually manage to access their account.

Use a single failure message regardless of whether a user is valid or a password was wrong.

Attackers can take advantage of separate responses for “invalid user” versus “wrong password” to check whether they have a valid account or not. By simply trying to log in with a bogus password and seeing whether or not the “invalid user” response comes back, they’re able to verify if that particular account exists. They can then try to figure out the password from other sources (perhaps checking to see if passwords are shared with a similarly-named account on a different, previously-compromised site).

Instead, just return the same response no matter what portion of the login attempt failed. For instance, “invalid username or password” is a simple, straightforward error message that doesn’t leak information about whether an account exists.

Don’t use “security questions” – they’re anything but.

Unlike passwords, which (theoretically) are completely arbitrary, security questions are generally the exact opposite – not arbitrary at all, but instead basic on specific, immutable, often publicly-available facts about the user. For password resets, just email the user a link with a single-use, randomly-generated reset token that can be used to change their password to something new. The big-name email providers have been handling the problem of account access for much longer than you have; let them do the hard work of dealing with account recovery in the case where a user’s email is also compromised. Keep your end of things simple.

This also applies to any over-the-phone or other out-of-band account support you provide – if a user forgets their password, email them a reset link. If they lost access to their email account as well, let the email provider handle that case. (In the very rare situation where a customer is completely unable to recover their email account, you can handle that on a case-by-case basis, or simply choose to go the “tough luck” route.)

Don’t reset the user’s password for them. Doing so opens up the door for an attacker causing trouble by resetting a legitimate user’s password. Only change a user’s password if they use a correct, not-already used reset link. (You can also make reset links expire after a certain period of time.)

Going the Extra Mile

These things are not necessarily mandatory, but are still good ideas. You should at least consider incorporating them.

Two-factor authentication

Two-factor authentication is based around the idea of needing two different things (factors) to log into an account. Generally, the first thing is “something you know” (usually a password), and the second thing is “something you have” (typically either a purpose-built device, or nowadays, a smartphone). The reasoning is that it is far harder to both manage to figure out your password and obtain access to your smartphone, than to just acquire one or the other. Your phone might get stolen, but the thief probably doesn’t know your password. Similarly, someone might guess your password, but they probably don’t have your smartphone. Either way, you’d likely notice if your smartphone went missing.

Before smartphones became so ubiquitous, two-factor authentication was a little arduous – it required dedicated devices which could generate one-time passwords (“OTPs”). Nowadays, however, it’s easy to load a simple app onto a smartphone (such as Google Authenticator; full disclosure – I work for Google, but the Authenticator app is open-source and based on open standards) and use it as an OTP-generating device.

On the server side, implementing two-factor support is actually very simple (for instance, about 20 lines of Python). There are even libraries for it in a number of common languages. You’re basically just storing a second, internally-generated password for each user that is used to verify their OTP codes.

Email verification for unknown access locations

If you don’t utilize two-factor authentication, consider at least requiring out-of-band verification for access from a location that has never been seen before for a given user. This helps prevent malicious access to an account without significantly interfering with regular usage, especially if the user is already using two-factor authentication for their email account.

Account knowledge test for unknown access locations

This is similar to the previous item in that it involves an additional hurdle when accessing an account from a new location for the first time. In this case, you would ask the user to enter some piece of information about the account that a regular user of the account would easily know, but is not publicly derivable from just the login credentials. (For instance, a game’s website might ask for the name of a character on the specified account.) This helps prevent “drive by” intrusions where the attacker is trying out a bunch of stolen credentials (perhaps from another compromised site) but doesn’t actually know anything else about the account.

Don’t force frequent password changes.

In this context, “frequent” means more often than once a year or so. If you make someone change their password too often, it’s quite possible that they’ll resort to less secure means of remembering it, which is worse than not having changed it in the first place. If you do require occasional password changes, give the user warning when their current password is about to expire – ideally out-of-band (e.g. via email) so that they’re aware of the impending expiration even if they’re not actively logging in on a regular basis.

See Also

If you want to learn more about web application security, here are some other resources you can explore:

Posted on September 4, 2012, in Software Development and tagged . Bookmark the permalink. 10 Comments.

  1. Quick note on bcrypt and storing in the database. Most bcrypt implementations encode the string with a special prefix and choose a salt securely depending on operating system already. Same with scrypt. You probably should not be choosing salts yourself because you most likely will choose wrong.

    • A good point. I went into detail regarding database storage since I personally choose not to take sides in the “ignore all else, use bcrypt” debate, and thus someone might not be using bcrypt. ;)

  2. Great article, you upper limit section reminded me something I implemented almost exactly like that last year for my internship. Had fast fail and temporary banning, etc.

    I agree with all the points made.

  3. Regarding your first paragraph about not mixing SSL and non SSL access for session based applications. What would you say to this idea? http://blog.novoj.net/2007/06/05/sdileni-session-mezi-protokoly-http-a-https/ (article is in Czech but knowing you are Czech too I guess you wouldn’t have problem reading it)

    • I am not actually Czech, but Google Translate seems to have done a fairly good job so I think I can respond. Basically, it depends on what your goals are for security. Anything which provides write access to user data should only be available on SSL – that means that if you’re using what is effectively a two-session-cookie approach (one secure cookie and one non-secure cookie), write access should require a valid secure cookie, and there should not be a way to derive the secure cookie from the insecure one. The simplest way to accomplish this is to just generate two different random session IDs, one for secure access and one for insecure access, and set one as a secure cookie and the other as a non-secure cookie.

      However, this means that someone can still hijack the non-secure cookie to read things as the user. Whether or not this is a problem depends on the functionality in question. As an example, it’d probably be fine to serve the home page of Reddit over HTTP (perhaps omitting the ‘recently viewed’ list); there’s no real sensitive user data on it, even for a logged-in user – just preferences. For the user’s message inbox, however, you’d almost certainly want to use HTTPS to prevent hijackers from reading private communications.

      • Ups … I feel embarassed a little bit now. I was mistaken by the tweet chain and thought the author was someone who is Czech. I humbly apologize.

        Thank you for answer – I must admit that nowadays we usually keep whole site under HTTP because performance hit for terminating SSL is less important now than it was several years ago. But for less important sites on shared hosting we still use this practice. And yes, everything that is sensitive must be served over HTTPS – but if there is a lot of public content and we just need to carry over session id in a secure way this seems ok.

  4. A note about TOTP second factors that you might want to include. It’s very important to aggressively limit the number of attempts against the second factor code, since the probability is quite high of guessing right /eventually/. A number of failed attempts against it should probably result in notifying the user that their password may have been compromised.

    Also, on the subject of limiting by location, 10-100 per day can actually be very low due to the prevalence of NAT. This limit would hurt reasonably sized companies, as well as several countries that only have a few IPs.

    • Failed attempts should be tracked regardless of which failed – password or OTP. They’re both access tokens.

      The 10-100 per day number assumes you’re tracking failed attempts on a given account per location – not all failed attempts against all accounts from that location. Trying to limit attempts against all accounts from a single location is a bad idea, for the reasons you mention.

      • But if you’re only blocking location after 10-100 attempts (even if you’re notifying the user), an attacker with a botnet will still have a very high chance of bypassing the second factor. OTP codes (as done by TOTP at least) should be more aggressively rate limited than passwords due to much lower entropy. Additionally, repeated failed attempts against the second-factor yields a very strong signal that the account is at high risk of compromise. This is the kind of situation where you should start considering locking the user’s account if you have a recovery system in place, or at the very least informing the user that their password has likely been compromised and should be changed (and if they reused their password elsewhere, they should change it there too)

      • That’s a good point. Something like 10 failed attempts where username and password were correct, but OTP was incorrect, should probably result in at the very least an emailed verification link before allowing further attempts.

Follow

Get every new post delivered to your Inbox.

Join 378 other followers