Several Dices To Rule Them All
In fact, our little rodent has several PRNG engines available in its ToolBox, inheriting from the TAesPrngAbstract class:
• TAesPrng is the main internal CSPRNG, using AES-CTR from several entropy sources;
• TSystemPrng redirects to the Operating System PRNG API, e.g. /dev/urandom on Linux or CryptGenRandom() on Windows;
• TOpenSslPrng wraps the OpenSSL PRNG API, if the library is available.
This document first details the mORMot cryptographic primitives, then the TAesPrng process
mORMot Cryptographic Primitives
You may know that AES, SHA1, SHA2, SHA3 and SHA512 standards are published by the library.
All those primitives will be used to construct a safe and efficient CSPRNG.
Their implementations share the following attributes:
• implemented in pure pascal or assembly, using HW acceleration if available (e.g. SSE4.2, AES-NI or SHA-NI);
• defined as low-level pascal classes, or high-level catalog using pascal interfaces;
• regression tests are run nightly from the mORMot repository to ensure its consistency.
Those regression tests consist in
• unit tests via reference vectors;
• core process verification - e.g. encrypt/decrypt consistency of random data;
• explicit performance regression;
• coverage of all code paths: pure pascal, plain assembly, SSE4.2/AES-NI assembly;
• comparison of all results between all available engines (including OpenSSL as reference);
• run on most supported platforms (several versions of Windows, Linux, MacOS on Intel/AMD and aarch64).
Detailed TAesPrng Process
Here is the description of mORMot PRNG, as implemented by the TAesPrng class in mormot.crypt.core.pas in the default gesUserOnly mode.
• A main TAesPrng instance is used, shared by all threads, but with thread-safe non blocking patterns (using ahead-of-time CTR reservation for thread safety);
• A one-shot TAesPrng instance can be used instead of the shared one, if needed.
• Each TAesPrng.Seed calls TAesPrng.GetEntropy (see next paragraph) to fill a 128 bytes buffer, which is then derived via at least 256 PBKDF2 rounds of HMAC-SHA-512 into a 512-bit transient output, which is then exploited in half;
• Lowest 256-bit are used to initialize the AES-CTR engine of this generator instance;
• Highest 256-bit are hashed to initialize a nonce / IV for the AES-CTR engine;
• CSPRNG data can then be retrieved via the standard AES-CTR pattern, which is a proven - and efficient - pattern;
• The engine is re-seeded after 32MB of output, i.e. 21-bit of CTR rounds - which is a very paranoid value in respect to the CTR and AES block and key sizes; on re-seed, the new state combines the previous state with some new TAesPrng.GetEntropy output;
• No seeding state could be forced in TAesPrng, so it is not sensitive to replays.
Using AES-CTR with 256-bit is a simple, but very efficient mean of getting randomness.
The generated output is just not truly "random": if you know the secret key and the IV, you could predict the next output. But in practice, as soon as those data are hidden and unknown, the output could be considered as "unpredictable". If you really want to avoid such a leak, you could just use your own transient TAesPrng instance: create it, get the random values, free it - the secret is hidden and the randomness is retrieved safe and fast enough. With AES-CTR 256-bit there is no possibility to guess the secret key from the generated output - at least with the current state of computer science. If this is broken, almost the whole Internet is broken, and we should know it soon enough, without even considering mORMot. This is the benefit of using standard algorithms: they are audited and scrutinized, so we don't rely on our own findings, but we share the potential weakness with others.
Entropy is Life
To any CSPRNG, entropy gathering is a very important step, and should be known and audited. In practice, the output results of our AES-CTR are as random as the entropy tends to be.
To be exhaustive, here is how TAesPrng.GetEntropy is implemented in the default gesUserOnly mode.
A SHA-3 SHAKE-256 instance is initialized then updated from the following sources:
1. At startup, in gesUserOnly mode, 512-bit of system entropy is retrieved once from Operating System standard sources, i.e. /dev/urandom or /dev/random under POSIX, or the CryptGenRandom() API on Windows – this content is used by point 5. below, and ensure some non-volatile memory is involved in the entropy gathering;
2. Executable and system information is used as salt/padding: os name and version, host, user, SMBios data, CPU cache layout, executable date, version and name (yes, mORMot knows all this, so let's use it!);
3. 512-bit of mORMot gsl_rng_taus2 Lecuyer random generator (one source seeded per thread);
4. 512-bit of entropy from the current thread context (stack values, TLS/pthread offset, Lecuyer state), RDRAND/RDTSC HW opcodes on Intel/AMD CPUs, CpuID flags, and basic RTL calls like Now or CreateGuid – i.e. a call to XorEntropy() from mormot.core.base.pas;
5. 512-bit of OS entropy, retrieved once at startup, is diffused in-place after each call for forward security, using AES-128-CTR with random IV and key (this is the default gesUserOnly mode, but OS could be called each time if needed);
6. 512-bit of entropy from several OS API: high-resolution timers, system state from /proc/* on Linux, CTL_HW on BSD, GetProcess*/GetSystem*() API on Windows - i.e. a call to XorOSEntropy() from mormot.core.os.pas;
Finally, the result buffer (initially filled with garbage bytes from heap) is ciphered via XOR using the SHAKE-256 instance in XOF mode for 128 bytes - we used this size to saturate the SHA-512 internal state.
More to Discover
As you can see, it may sounds a bit complicated, but in fact, the whole randomness generation path has been refined to produce good random numbers for your own projects, in a cross-platform way, and with very high performance (several GB/s on my PC)!
This implementation has been audited by several experts, and found good enough to be used on production.
Of course, for a very sensitive usage, like a prime number generation for RSA, we rely first on the audited Operating System library, then we XOR our own CSPRNG results. It won't hurt for sure.