Token entropy explained: what is token entropy?
Security and identity people just love technical jargon. There's tons of it. Some is pretty banal, like identity federation. Some of it sounds neat, like tabnabbing. It's pretty much all impenetetrable, though.
I'll try to dispel confusion around one particular bit of jargon: token entropy. You might have heard this before. If you're lucky, you might have figured it out from context clues. But if you're like most of us, you just want someone to explain it simply.
I'll do my best here. Let's start with a TL;DR: token entropy just means randomness!
What is token entropy?
Okay, so token entropy breaks down into two pretty obvious parts: token and entropy. Let's take them one-by-one. We'll start with token and then talk about entropy.
What is a token?
This one's kind of tough.
"Token" is kind of a generic container term. It doesn't intrinsically mean very much.
If you're a software engineer, you might be familiar with these kinds of abstract terms. You might talk about "resources" or "services" or "objects" at work. Pressed to explain exactly what those mean, you would probably have a hard time. Well, "token" is a bit like that!
For those of you who aren't software engineers, imagine I'd said "thingie." That's not a very descriptive term, but sometimes that's exactly what you want! There are cases where you just need a bit of a placeholder word.
"Token" in this context just means a representation of authentication or authorization status. (See: AuthN vs. AuthZ) It's often just a chunk of data that we link to a user.
Here's an example using a particular kind of token, called a JSON Web Token (JWT):
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6Ik5lZCBPJ0xlYXJ5IiwibXNnX3RvX2Jsb2dfcmVhZGVyIjoiaGVsbG8iLCJpYXQiOjE1MTYyMzkwMjJ9.ChPEZQ5cI207MIvHl2S3-LmQaOKSTs8ppV_pjRhqsOk
This isn't the only kind of token out there, to be clear. In fact, there are lots! But the general idea is the same -- a token is a representation of authentication or authorization status.
What is entropy (in security)?
There's an idea in phsyics called "entropy." I'm not sure I could give you a great explanation here. It's just one of those foundational concepts that defies intuition; it means an awful lot of different things. On some level, it's useful to understand (the physics version of) entropy as a measurement of how disordered or chaotic a system is.
Good news, though. Entropy in software security isn't nearly as subtle. It's still pretty complicated, but it's less metaphysical than its analogue in physics.
Entropy in security basically just means randomness. If we say something is "high entropy," we just mean that it's really, really random. We mean that it's hard to guess.
If we say that a given token is high entropy, that means it's selected from a very, very large number of possible states. It's just one possible configuration selected -- at random -- from an overwhelming number of similarly likely configurations.
Entropy in passwords
Let's back up from 'tokens' for just a moment. We can apply the same idea of entropy to passwords, which are pretty familiar.
Suppose you require your users to set up a four-digit PIN as their password, selecting each digit from the numbers 0 through 9. We could consider this pretty low entropy, because there are 10^4 possible PINs. If you spent all day guessing PINs, you'd eventually get it right. It would take a computer less than a second to generate all of the possible PINs.
By contrast, imagine you require a 30 character password comprising a mix of uppercase letters, lowercase letters, numbers, and special characters. Such a password has about 1.5 * 10^59 possible configurations. A computer would not be able to enumerate all such configurations even if it ran for the entire age of the universe.
Entropy in tokens
We often need tokens to be secrets, just like passwords. So when we're talking about entropy in tokens, the idea is exactly the same.
"Entropy" in tokens just describes how hard they'd be to guess. It's a measure of randomness. If a token is high entropy, that means it's very difficult to guess.
How to estimate the entropy of a token
Calculating the entropy of a token is kind of involved. It's not so hard when you get the hang of it, but it requires explanation of some combinatorics that's out of scope here.
Fortunately, Wolfram Alpha has a really helpful tool for this! If you have an example of a token (please don't use a real secret token), you can use Wolfram Alpha to estimate its entropy. I put together an example query to estimate token entropy.
That should cover your bases!
Tesseral: secure auth for SaaS
Tesseral is open source auth infrastructure for SaaS applications. It includes everything you need to manage logins and identity at any scale.
For example, it includes a service that can manage API keys for you. That is, if you expose a public API to your customers, you need to use secure API keys to authenticate your customers' requests. Tesseral can manage the entire lifecycle of an API key and make sure that they're extremely high entropy (i.e., functionally impossible to guess at random).
See our related article: API Key Management Service: What It Is, Why It Matters, and How to Choose One.