tl;dr: Entropy measures ignorance. It constantly increases in the universe, arising from the fact that random changes most likely lead to pure noise from which no pattern can be learned. This can explain a wide area of phenomena, including energy dissipation, password security, and even the existence of life.
Epistemic state: I've spent much time thinking about entropy in the past years, as it regularly appears in my work in different forms. I'm quite confident in most points, though there are definitely also some hot takes in here.
Ah, entropy... Everyone who knows me also knows that I've fallen in love with this concept. Why is my fridge so warm on the backside if it's supposed to cool things? Why is my phone battery dead again? And especially the one of my old phone that I didn't even use! Why is it so hard to undo a crack in glass or ceramics? Why does mixing a salad never ends in the contents being sorted again? Why is Sudoku sometimes so hard to solve? How can a password with a handful of letters and symbols possibly be secure? And why is the future often so hard to predict? And, anyhow, why is there life in the universe? The answer to all of these is entropy. Lovely!
But first, what is entropy? Entropy is always defined with respect to a specific process yielding measurement results, and to a belief distribution over these possible outcomes. Suppose you're rolling a dice and every time you do you get a new number. You can't say beforehand with certainty what the next outcome will be. However, if the dice is biased towards some sides, you might have some information about it. Entropy is the amount of information you're missing to predict the next outcome perfectly, or put in simpler words, entropy is how much you don't know. Entropy is a measure of ignorance.
Luckily, we already have a way to put an actual number on this "amount of information". Entropy is defined as $$ H(\textcolor{blue}{p}) = \textcolor{green}{-}\sum_{\textcolor{red}{x}} \textcolor{blue}{p(}\textcolor{red}{x}\textcolor{blue}{)} \textcolor{green}{\log_2} \textcolor{blue}{p(}\textcolor{red}{x}\textcolor{blue}{)} $$ Let's disassemble this heap of symbols into meaning: $\textcolor{red}{x}$ refers to an outcome, drawn from the measurement process you care about. In our example, a dice throw can yield any number 1 to 6. The symbol $\textcolor{blue}{p}$ stands for probability distribution, i.e., $\textcolor{blue}{p(}\textcolor{red}{x}\textcolor{blue}{)}$ tells for each possible $\textcolor{red}{x}$ how likely you think it is to observe $\textcolor{red}{x}$. For example, if we have a fair dice, all outcomes $\textcolor{red}{x}$ have the same probability $\textcolor{blue}{p(}\textcolor{red}{x}\textcolor{blue}{)}=\frac 16$. The $\textcolor{green}{\log_2}$ stands for logarithm, and the basis of the logarithm in the equation tells which unit of information we use, where base 2 means we use "bits". Curiously, base 2 allows to reinterpret entropy as the average number of (optimal) yes/no questions you need to ask to completely remove the uncertainty you have about the next outcome. Plugging the numbers from our example with a fair dice into the equation above tells us that we need to ask on average approximately 2.6 yes/no questions about the outcome to find out what the dice shows. Back to the equation, the part $\textcolor{green}{-\log_2} \textcolor{blue}{p(}\textcolor{red}{x}\textcolor{blue}{)}$ is also called information gain or surprise, which makes entropy the expected surprise. Note also that entropy is maximal for the uniform distribution, that is, where all outcomes are equally likely and you have no idea what comes next. On the other hand, entropy is 0 if you already exactly know the next outcome.
A world of possibilities
Entropy is high when there are lots of possible outcomes and you have little idea about which one you're going to find. Let's take a look at another example: passwords. The basic idea of generating a secure password is to have an amount of possible passwords so huge that it is practically impossible to enumerate them all, and then draw from them with equal probability. In other words, we want to confront adversaries with such a high entropy that they can't overcome it. Let's generate such a password: Take the letters a to z, numbers 0 to 9, and a handful of punctuation symbols — maybe ;:.,!? — and draw from these 42 characters 6 times with equal probability, e.g., '1wpt?z'. This gives already 5.5 billion possibilities! We can use the equation above to find out how much an adversary doesn't know: An adversary needs to find out about 32.4 bits. While 5.5 billion might seem a lot, a modern computer can quickly enumerate all of them. However, if we increase the password length to 12, we get already $42^{12} \approx 10^{19}$ possible passwords! That's a lot! An adversary has now to figure out about 64.7 bits, which is far beyond the capability of modern computers to enumerate them all. Notice how the number of possible passwords grows exponentially in the password length, while entropy grows linearly.
So, when setting a password, make sure it has high entropy, instead of merely "looking complicated". Though, honestly, nobody can say it better than xkcd#936:
Unstoppable ignorance
You cannot talk about entropy without talking about the second law of thermodynamics. It states that entropy statistically increases over time $$ \frac{d}{dt} S(t) > 0 $$ where $S$ refers to the entropy of the system at a time $t$. In the context of thermodynamics, entropy is a measure of energy dissipation, dependent on other thermodynamic properties, such as internal energy and temperature.
Take for example a cup of hot coffee, to which you add some milk. In the moment right after the milk hits the surface of the coffee, both substances are still separated and of different temperature. Then, when the milk settles down and starts to diffuse into the coffee, interesting and complicated swirls may appear. After some time, both liquids have completely merged into each other and form a homogeneous mixture. It has reached maximum entropy, which is also called a thermal equilibrium.
How does this fit together with entropy being a measure of ignorance? Notice how in the beginning we could still tell where the milk is and where the coffee. There were emergent properties that allow us to draw boxes around each substance based on its location or based on its temperature. Afterwards, it is very hard to tell where the milk is and where the coffee. The system lost this emergent property. For any region of space, we'd now need to look extremely closely whether a given molecule is a "coffee molecule" or a "milk molecule". They are uniformly distributed now. We have lost information. Entropy has increased.
The point is that almost all possible states look random, that is, they don't show any emergent properties. As a consequence, if you randomly change the state of a system, you highly likely end up with a high-entropy state. This is why it is so hard to keep heat energy outside a fridge, or a battery charged. This is also why mixing a salad never ends in all the ingredients being neatly separated again. And why sorting trash is so important. In fact, all such processes that keep entropy low in a subsystem can only do so by expending external effort and thus increasing entropy somewhere else.
If entropy is increasing over time, then it must have been lower in the past. According to one of the leading theories about the beginning of the universe, everything started out in a very hot, almost uniformly distributed state. How is this low entropy? The answer is gravity. Thermodynamics implicitly ignores gravity, which is a great approximation for some gas in a human-operable box on Earth, but not for the whole universe. In fact, it turns out that most of the entropy in the universe is actually in black holes.
For this reason, I find it very misleading to think of entropy as a measure of "disorder" or "chaos". Instead, think of it as a measure of ignorance. I like to think of a black hole as the ultimate encryption machine, ensuring every bit of information that falls in there remains inaccessible forever. So, we can reinterpret the second law of thermodynamics to say that information gets increasingly inaccessible with time. And since this is a physical law, there is no way around it eventually.
Confidence and humility
We have seen that entropy measures ignorance arising from a probability distribution that is smoothed out over a vast number of possibilities and that entropy relentlessly continues to increase in the universe. There is, however, a property of entropy that really bugs me: it is always defined relative to someone who measures the next outcome. This "someone" is reminiscent of the one in the quantum measurement problem (which is going to be a topic in another post), and also causes me a very similar same headache: We wanted to describe an objective property of the state of a system, but if you look close enough, this property refuses to be objective and all that's left is a subjective measure of ignorance.
It is intrinsically subjective, because the assigned probabilities are. Even if you think you just keep track of the frequencies how often some measurement result appears, it's still you how is keeping track of them, and, more importantly, it's you who is observing the results in the first place. This shift in paradigm, to understand probabilities as subjective beliefs, is called the Bayesian interpretation.
Without going into much details of how Bayesian reasoning works, it makes the best possible use of incoming information. It updates your beliefs relative to how surprised you are by new observations. That is, it requests both to learn from every bit of information you get, and simultaneously not assume anything beyond that. It asks you to maximize the entropy of your beliefs, respecting what you already know. It calls on us to have confidence in what you've learned and humility about what you don't know.
Let there be life
Increasing entropy is the driving force behind the evolution of everything in the universe. It is the reason we can distinguish the future from the past. At maximum entropy everything looks random, no star would shine anymore and no pattern or concept could be learned.
In fact, life exists because it provides a faster way to increase entropy. We disperse the high-energy photons coming from the sun into many more, but lower-energy, photons that the Earth sends back into the universe. This is what allows life on Earth to flourish, to find better and more efficient ways to exploit this entropy market gap. And, ultimately, the same force that enables life also guarantees our eventual death. We are complex phenomena arising in the universe on its way to equilibrium. We are the swirls in the coffee :)
Even though we are here, because evolution found ever more sophisticated ways to increase entropy more efficiently, it doesn't mean we have to follow its command. Instead, we can use these insights to create our lives as we truly want to live them, respect the larger forces and life itself, and make the best of our finite time in this amazing Universe ✨