Why OpenSSL? OpenSSL is the reference library for cryptography and secure TLS/HTTPS communication. It is part of most Linux/BSD systems, and covers a lot of use cases and algorithms. Even if it had some vulnerabilities in the past, it has been audited and validated for business use. Some algorithms […]
Last week, I committed new ASM implementations of our AES-PRNG, AES-CTR and AES-GCM for mORMot 2.
They handle eight 128-bit at once in an interleaved fashion, as permitted by the CTR chaining mode. The aes-ni opcodes (
aesenc aesenclast) are used for AES process, and the GMAC of the AES-GCM mode is computed using the
Resulting performance is amazing: on my simple Core i3, I reach 2.6 GB/s for
aes-128-ctr, and 1.5 GB/s for
aes-128-gcm for instance - the first being actually faster than OpenSSL!
I have just committed some new AesNiHash32 AesNiHash64 AesNiHash128 Hashers for mORMot 2. They are using AES-NI and SSE4.1 opcodes on x86_64 and i386. This implementation is faster than the fastest SSE4.1 crc32c and with a much higher usability (less collisions). Logic was extracted from the Go […]
EKON 24 just finished. "The conference for Delphi & more" was fully online this year, due to the viral context... But this was a great event, and I am very happy to have been part of it. Please find the slides on my two sessions: mORMot 2 Performance: from Delphi to AVX2 Of course, […]
As a gift to the FPC community, I just committed a new Memory Manager for FPC.
Check mormot.core.fpcx64mm.pas in our mORMot2 repository.
This is a stand-alone unit for FPC only.
It targets Windows and Linux multi-threaded Service applications - typically mORMot daemons.
It is written in almost pure x86_64 assembly, and some unique tricks in the Delphi/FPC Memory Manager world.
It is based on FastMM4 (not FastMM5), and we didn't follow the path of the FastMM4-AVX version - instead of AVX, we use plain good (non-temporal) SSE2 opcode, and we rely on the mremap API on Linux for very efficient reallocation. Using mremap is perhaps the biggest benefit of this memory manager - it leverages a killer feature of the Linux kernel for sure. By the way, we directly call the Kernel without the need of the libc.
We tuned our x86_64 assembly a lot, and made it cross-platform (Windows and POSIX). We profiled the multi-threading, especially by adding some additional small blocks for GetMem (which is a less expensive notion of "arenas" as used in FastMM5 and most C allocators), introducing an innovatice and very efficient round-robin of tiny blocks (<128 bytes), and proper spinning for FreeMem and medium blocks.
It runs all our regression tests with huge performance and stability - including multi-threaded tests with almost no slow down: sleep is reported as less than 1 ms during a 1 minute test. It has also been validated on some demanding multi-threaded tasks.
2018-11-12. Pascal Programming
I've uploaded two sets of slides from my presentations at EKON 22 : Object Pascal Clean Code Guidelines Proposal High Performance Object Pascal Code on Servers with the associated source code The WorkShop about "Getting REST with mORMot" has a corresponding new Samples folder in our […]
In the last weeks/months, we worked a lot with FPC.
Delphi is still our main IDE, due to its better debugging experience under Windows, but we target to have premium support of FPC, on all platforms, especially Linux.
The new Delphi Linux compiler is out of scope, since it is heavily priced,
its performance is not so good, and ARC broke memory management so would need a
deep review/rewrite of our source code, which we can't afford - since we have
FPC which is, from our
opinion, a much better compiler for Linux.
Of course, you can create clients for Delphi Linux and FMX, as usual, using the cross-platform client parts of mORMot. But for server side, this compiler is not supported, and will probably never be.
You probably know about our SynLZ compression unit, in pascal and x86 asm, which is very fast for compression with a good compression ratio, and proudly compete with LZ4 or Snappy. It is used in our framework everywhere, e.g. for WebSockets communication, for ECC encrypted file content, or to […]
A lot of our code, and probably yours, is highly relying on text process. In our mORMot framework, most of its features use JSON text, encoded as UTF-8. Profiling shows that a lot of time is spent computing the end of a text buffer, or comparing text content. You may know that In its SSE4.2 feature […]
2015-06-21. Pascal Programming
Almost every time I'm debugging some core part of our framework, I like to see the generated asm, and trying to optimize the pascal code for better speed - when it is worth it, of course! I just made a nice observation, when comparing the assembler generated by Delphi to FPC's output. Imagine you […]
We have just included some optimized x64 assembler to our Open
Source SynCrypto.pas unit
so that SHA-256 hashing will perform at best speed.
It is an adaptation from tuned Intel's assembly macros, which makes use of the SSE4 instruction set, if available.
2015-01-15. Open Source
Today, we committed a new patch to enable AES-NI hardware acceleration to our SynCrypto.pas unit. Intel® AES-NI is a new encryption instruction set that improves on the Advanced Encryption Standard (AES) algorithm and accelerates the encryption of data on newer processors. Of course, all this is […]
Cyclic Redundancy Check (CRC) codes are widely used for integrity checking
of data in fields such as storage and networking.
There is an ever-increasing need for very high-speed CRC computations on processors for end-to-end integrity checks.
We just introduced to mORMot's core unit
SynCommons.pas) a fast and efficient
It will use either:
- Optimized x86 asm code, with unrolled loops;
- SSE 4.2 hardware crc32 instruction, if available.
Resulting speed is very good.
This is for sure the fastest CRC function available in Delphi.
Note that there is a version dedicated to each Win32 and Win64 platform - both performs at the same speed!
In fact, most popular file formats and protocols (Ethernet, MPEG-2, ZIP,
RAR, 7-Zip, GZip, and PNG) use the polynomial
Intel's hardware implementation is based on another polynomial,
$1EDC6F41 (used in iSCSI and Btrfs).
So you would not use this new
crc32c() function to
replace the zlib's
crc32() function, but as a
convenient very fast hashing function at application level.
For instance, our
TDynArray wrapper will use it for fast items
2013-12-05. Open Source
Do you remember this former article about scalability of the Delphi memory manager, in multi-thread execution context?
Our SynScaleMM is still experimental.
But did pretty well, for an experiment!
At first, you can take a look at ScaleMM2, which is more stable, and based on the same ground.
But a new multi-thread friendly memory manager for Delphi just came
It is in fact the anonymous (and already famous) "NN memory manager" Primož talked about in his article about string building and memory managers.
(Note that in this article, our SynScaleMM was found to be scaling very well, but on the other hand, Primož did compile its benchmark program in Debug mode, so our
TTextWriter was not in good shape:
when you compile in Release mode, optimizations and inlining are ON,
and our good
TTextWriter just flies... See the note at the
beginning of the article - this is why I never find those benchmarks very
informative. I always prefer profiling from the real world with real useful
process… and was never convinced by any such naive benchmark.)
OK, back to our business!
SapMM is an interesting beast.
Sounds like if Alexei (the initial coder) has a C coding background. But
that's fine when you have to deal with low-level structures and algorithms, as
required by a memory manager.
It features everything we may ask for such a piece of code: clear design, optimized code (mostly by inlining process), memory leak reporting, some parameters for tuning.
It is only for Delphi XE (and up) under Win32 by now, but contributors are
It is used in production since more than half a year, and it passed all FastcodeMM benchmark tests.
If you want a direct link of the today's source code, without SVN, you may
try this direct link from
(but it probably will never be updated - you are warned)
2013-05-21. Pascal Programming
Apart from being very slow during compilation, the Delphi NextGen compiler introduced a new memory model, named ARC.
We already spoke about ARC years ago, so please refer to our corresponding blog article for further information, especially about how Apple did introduce ARC to iOS instead of the Garbage Collector model.
About how ARC is to be used in the NextGen compiler, take a look at Marco's blog article, and its linked resources.
But the ARC model, as implemented by Embarcadero, has at least one huge
performance issue, in the way weak
references, and zeroing weak pointers have been implemented.
I do not speak about the general slow down introduced during every class/record initialization/finalization, which is noticeable, but not a big concern.
If you look at XE4 internals, you will discover a disappointing global lock introduced in the RTL.
2013-03-13. Pascal Programming
We have included x64 optimized asm of FillChar() and Move() for Win64 - for corresponding compiler targets, i.e. Delphi XE2 and XE3. It will handle properly cache prefetch and appropriate SSE2 move instructions. The System.pas unit of Delphi RTL will be patched at startup, unless the NOX64PATCHRTL […]
I'm happy to announce that mORMot units are now compiling and working great in 64
bit mode, under Windows.
Need a Delphi XE2/XE3 compiler, of course!
ORM and services are now available in Win64, on both client and
Low-level x64 assembler stubs have been created, tested and optimized.
UI part is also available... that is grid display, reporting (with pdf export and display anti-aliasing), ribbon auto-generation,
SynTaskDialog, i18n... the main SynFile demo just works
Overall impression is very positive, and speed is comparable to 32 bit version (only 10-15% slower).
Speed decrease seems to be mostly due to doubled pointer size, and some less
optimized part of the official Delphi RTL.
But since mORMot core uses its own set of functions (e.g. for JSON serialization, RTTI support or interface calls or stubbing), we were able to release the whole 64 bit power of your hardware.
Delphi 64 bit compiler sounds stable and efficient. Even when working at low
level, with assembler stubs.
Generated code sounds more optimized than the one emitted by FreePascalCompiler - and RTL is very close to 32 bit mode.
Overall, VCL conversion worked as easily than a simple re-build.
Embarcadero's people did a great job for VCL Win64 support, here!
2012-12-20. Pascal Programming
I was looking for a faster IntToStr implementation and discovered SynCommons.pas.
That's really too bad,
SynCommons.pasreally does contain some seriously fast stuff, people would greatly benefit from it if it was made general-purpose.
In fact, it would not be enough to change the RTL function
IMHO, to write something scalable, you need to get rid of such functions.
2012-11-13. Pascal Programming
It is a strong-typed, compiled, cross-platform, and concurrent.
It features some nice high-level structures, like maps and strings, and still have very low-level access to the generated code: pointers are there, in a safe strong-typed implementation just like in pascal, and there is even a "goto", which sounds like an heresy to dogmatic coders, but does make sense to me, at least when you want to optimize code speed, in some rare cases.
It is created/pushed by Google, used internally by the company in their computer farms, and was designed by one of the original C creators.
2012-04-10. Pascal Programming
One potential issue with Delphi coding, is about how the result of a functions are implemented.
If you forget to set a
result value to a function, you'll
get a compiler warning.
Never underestimate such warning: IMHO this is not a warning, but an error.
And you should better be aware of the handling of reference-counted types
string) in a function results: those are passed the stack as
var parameters, so the result of a function may be set
even if an exception is raised during function execution!
« previous entries - page 1 of 2