New AesNiHash for mORMot 2

I have just committed some new AesNiHash32 AesNiHash64 AesNiHash128 Hashers for mORMot 2.

They are using AES-NI and SSE4.1 opcodes on x86_64 and i386.
This implementation is faster than the fastest SSE4.1 crc32c and with a much higher usability (less collisions).
Logic was extracted from the Go Runtime, and optimized for Delphi/FPC.

Purpose

Its purpose is NOT to replace crc32c or MD5 or SHA2 or SHA3. It is not to be used to compute a signature..
It is used internally by mORMot 2 to hash elements, e.g. strings, and store them into a Hash table - see e.g. our TDynArrayHashed or TSynDictionary.

In fact, the hashes are initialized with a new random AES key at startup, so every time a process is launched, the hashes will change. As a consequence, they should be used internally within the process, never store on disk..
But this random seeding is a good solution to resist to DOS attacks.

The Fastest 32/64/128-bit Hash on Delphi

Numbers on i386 and x86_64 are amazing: more than 15GB/s (GigaBytes!) on my Core i3.
Regular crc32c with SSE4.1 is 2-3GB/s and optimized crc32c from Intel is slightly slower. Tuned xxHash32 assembly is around 4GB/s, and I can't even mention how slow in comparison is the hash algorithm used by the Delphi RTL.

But the output of the AesNiHash is much better, since it has much less collisions than crc32c or xxHash32..
It is based on the AES permutation hardware instructions of modern Intel/AMD CPUs, for both safety and performance.

To be fair, performance on huge blocks is not the main point for a classical hashmap/hashtable use case. If your keys are numbers or strings, you won't hash MB of data for sure.
Therefore we tuned the AesNiHash performance from the very first byte. The smallest input length of 0-15 bytes are taken without any branch, 16-128 bytes have no loop, and, of course, if your key is huge, 129+ bytes are hashed with 128 bytes per iteration. And it passes all our regression tests just as good as our previous hashers.

Proven Origin

Proposing a new hash, and showing numbers is fine, but you should just not trust me: is this algorithm usable?
This time, we didn't reinvent the wheel, we re-implemented the Go RTL assembly, with some optimizations for our particular use. So it is a proven algorithm, used on production by Google and a lot of companies since years..
Check aeshashbody in Go runtime asm_amd64.s as reference.

Feedback is welcome on our forum!

Synopse Open Source

Purpose

The Fastest 32/64/128-bit Hash on Delphi

Proven Origin

Add ping