What is entropy and why should you care?
It’s been some time since I’ve started programming - back then, Turbo Pascal was one of the most common development environments. I still recall the joy of learning about a new method, or API and the cool things that could be done with it. Everything seemed very fascinating and there seemed to be so many possibilities.
Ironically, technology was a lot more limited back then. Even worse, information did not spread very well, since the internet was still in its incipient days.
But let’s get back to one of the programming lessons that I’ve learned around that time, which dealt with random numbers.
In Turbo Pascal, like in many languages, there was a Random()
function which could be used to generate pseudo-random numbers. I’ve started playing around with it, but after a few tests the limitations were obvious: the numbers seemed random, but whenever the program ran, the same numbers would be generated, in the same order.
Turns out, there was another function called Randomize()
, which solved this problem. Just by calling this method first, the numbers returned by Random()
seemed to be more random and they did not repeat during each execution.
Behind the scene, Randomize()
only initialized the random number generator algorithm with the current time. Based on this initial value, the algorithm behind Random()
would generate a sequence of pseudo-random numbers.
This approach would be considered bad practice nowadays. The values returned by the system clock can be predicted based on the logs and various other events. In a distributed environment, it would be possible for two algorithm instances to end up with the same seed (initial value), which would mean that the same random numbers would be generated on multiple application instances.
Cryptography relies heavily on random numbers which are very hard to predict. The term entropy was introduced to measure the predictability of random numbers, hence also their quality. Low entropy means that the numbers could be predicted easily, while a high entropy guarantees the opposite.
Why Should You Care?
As it turns out, entropy plays a very important role for applications, but also for the operating system. Gone are the days of Turbo Pascal when we were using DOS applications from the command line and sharing files using floppy disks. Operating systems and applications rely heavily on networking and in order to be able to share the information securely, cryptography is often involved.
The demand for entropy has increased a lot, but good quality random numbers cannot be generated easily. Other sources of entropy need to be used - such as the keyboard, the mouse or other hardware related events, especially from noisy sources. Due to the low level nature of these operations, random numbers are usually generated by the operating systems and exposed to applications through various streams.
Depending on the operating system, there are two ways of generating random numbers:
- non blocking sources which guarantee that clients will never have to wait for a random number to be available; in general, these are less secure, since the entropy data is stored in a pool and reused
- blocking sources which need to be fed constantly from entropy sources; when no more entropy data is available, clients will need to wait until enough data is collected for the generation of the next value
Non blocking sources are not very reliable from a security point of view, but they can be used in simple scenarios, where security is not that important. Blocking sources are more preferable, but they can be depleted easily, especially on server environments.
Servers are usually running headless, with no keyboard or mouse attached to them. When virtualization is used, things only get worse, since the hardware level is further down the line and most often not exposed directly to the guest. This reduces significantly the available entropy for the guest operating system, which affects the overall performance - since random numbers become harder to generate.
What Can You Do About It?
The options depend a lot on the operating system. Entropy is often mentioned in the Linux environment, since there are more options available. Some operating systems avoid the performance issues caused by blocking random number sources and expose only non blocking ones (e.g. Windows). Unfortunately this raises the risk of using predictable random numbers for cryptography, thus affecting the overall security of the system and the data involved.
On Linux environments, you can determine the available entropy with:
# cat /proc/sys/kernel/random/entropy_avail
A low number, less than 100-200 is a sign of trouble. Performance is usually affected in this case, which will usually manifest in slow response times for encrypted network connections.
There are two ways of improving the existing entropy:
- Hardware random number generators
- Software level random number generators
Hardware level solutions are always preferable, but they increase the cost and the complexity of the infrastructure.
For software level random number generators there are multiple options. In Linux environments, haveged
is usually used. The setup is quite simple,
For Fedora,
yum install haveged
chkconfig haveged on
For other distributions, install the haveged
package using your package manager and make sure it starts up at boot.