2021-08-17

mORMot 2 on Ampere AARM64 CPU

Last weeks, we have enhanced mORMot support to one of the more powerful AARM64 CPU available: the Ampere Altra CPU, as made available on the Oracle Cloud Infrastructure.

Long story short, this is an amazing hardware to run on server side, with performance close to what Intel/AMD offers, but with almost linear multi-core scalability. The FPC compiler is able to run good code on it, and our mORMot 2 library is able to use the hardware accelerated opcodes for AES, SHA2, and crc32/crc32c.

Continue reading

2021-05-08

Enhanced Faster ZIP Support in mORMot 2

The .zip format is from last century, back to the early DOS days, but can still be found everywhere. It is even hidden when you run a .docx document, a .jar application, or any Android app!
It is therefore (ab)used not only as archive format, but as application file format / container - even if in this respect using SQLite3 may have much more sense.

We recently enhanced our mormot.core.zip.pas unit:

  • to support Zip64,
  • with enhanced .zip read/write,
  • to have a huge performance boost during its process,
  • and to integrate better with signed executables.

Continue reading

2020-02-17

New move/fillchar optimized sse2/avx asm version

Our Open Source framework includes some optimized asm alternatives to RTL's move() and fillchar(), named MoveFast() and FillCharFast().

We just rewrote from scratch the x86_64 version of those, which was previously taken from third-party snippets.
The brand new code is meant to be more efficient and maintainable. In particular, we switched to SIMD 128-bit SSE2 or 256bit AVX memory access (if available), whereas current version was using 64-bit regular registers. The small blocks (i.e. < 32 bytes) process occurs very often, e.g. when processing strings, so has been tuned a lot. Non temporal instructions (i.e. bypassing the CPU cache) are used for biggest chunks of data. We tested ERMS support, but it was found of no benefit in respect to our optimized SIMD, and was actually slower than our non-temporal variants. So ERMS code is currently disabled in the source, and may be enabled on demand by a conditional.

FPC move() was not bad. Delphi's Win64 was far from optimized - even ERMS was poorly introduced in latest RTL, since it should be triggered only for blocks > 2KB. Sadly, Delphi doesn't support AVX assembly yet, so those opcodes would be available only on FPC.

Resulting numbers are talking by themselves. Working on Win64 and Linux, of course.

Continue reading