2022-02-15

mORMot 2 ORM Performance

The official release of mORMot 2 is around the edge. It may be the occasion to show some data persistence performance numbers, in respect to mORMot 1.

For the version 2 of our framework, its ORM feature has been enhanced and tuned in several aspects: REST routing optimization, ORM/JSON serialization, and in-memory and SQL engines tuning. Numbers are talking. You could compare with any other solution, and compile and run the tests by yourself for both framework, and see how it goes on your own computer or server.
In a nutshell, we almost reach 1 million inserts per second on SQLite3, and are above the million inserts in our in-memory engine. Reading speed is 1.2 million and 1.7 million respectively. From the object to the storage, and back. And forcing AES-CTR encryption on disk almost don't change anything. Now we are talking. ;)

Continue reading

2021-09-21

Delphi 10.4 / Delphi 11 Alexandria Breaking Changes

The latest revision of Delphi, named Delphi 11 Alexandria, is out.
A lot of new features, some enhanced platforms. Nice!
But it is also for us the opportunity to come back to some breaking changes, which appeared in Delphi 10.4 earlier this year, and are now "officially" part of Delphi 11.

The main breaking change of Delphi 10.4 and later, as reported by mORMot users, is the new lifetime of local variables.
TL&LR: a local variable which is not explicitly declared, but returned by a function may be released as soon as it is not used any more, whereas in the original implementation, it was allocated as a regular local variable, and we could expect its lifetime to remain active up to the end of the function. With Delphi 10.4, it is not the case any more: the compiler could release/clear the local variable sooner, to reduce the allocation pressure.

Idea behind this change is that it may have better register allocation within the function, so it "may" theoretically result in faster code. Not convinced about it, anyway - we will discuss that.
The main thing is that it could break existing code, because it breaks the Delphi compiler expectation since decades.
Some perfectly fine working code would end to work as expected. We have identified several use cases with mORMot which are affected by this change. Since it seems there will be no coming back from Delphi point of view, it is worth a blog article. ;)

Continue reading

2021-08-17

mORMot 2 on Ampere AARM64 CPU

Last weeks, we have enhanced mORMot support to one of the more powerful AARM64 CPU available: the Ampere Altra CPU, as made available on the Oracle Cloud Infrastructure.

Long story short, this is an amazing hardware to run on server side, with performance close to what Intel/AMD offers, but with almost linear multi-core scalability. The FPC compiler is able to run good code on it, and our mORMot 2 library is able to use the hardware accelerated opcodes for AES, SHA2, and crc32/crc32c.

Continue reading

2021-02-22

OpenSSL 1.1.1 Support for mORMot 2

Why OpenSSL? OpenSSL is the reference library for cryptography and secure TLS/HTTPS communication. It is part of most Linux/BSD systems, and covers a lot of use cases and algorithms. Even if it had some vulnerabilities in the past, it has been audited and validated for business use. Some algorithms  […]

Continue reading

2021-02-13

Fastest AES-PRNG, AES-CTR and AES-GCM Delphi implementation

Last week, I committed new ASM implementations of our AES-PRNG, AES-CTR and AES-GCM for mORMot 2.
They handle eight 128-bit at once in an interleaved fashion, as permitted by the CTR chaining mode. The aes-ni opcodes (aesenc aesenclast) are used for AES process, and the GMAC of the AES-GCM mode is computed using the pclmulqdq opcode.

Resulting performance is amazing: on my simple Core i3, I reach 2.6 GB/s for aes-128-ctr, and 1.5 GB/s for aes-128-gcm for instance - the first being actually faster than OpenSSL!

Continue reading

2021-02-12

New AesNiHash for mORMot 2

I have just committed some new AesNiHash32 AesNiHash64 AesNiHash128 Hashers for mORMot 2. They are using AES-NI and SSE4.1 opcodes on x86_64 and i386. This implementation is faster than the fastest SSE4.1 crc32c and with a much higher usability (less collisions). Logic was extracted from the Go  […]

Continue reading

2020-11-04

EKON 24 Presentation Slides

EKON_24.png, Nov 2020

EKON 24 just finished. "The conference for Delphi & more" was fully online this year, due to the viral context... But this was a great event, and I am very happy to have been part of it. Please find the slides on my two sessions: mORMot 2 Performance: from Delphi to AVX2 Of course,  […]

Continue reading

2020-05-07

New Multi-thread Friendly Memory Manager for FPC written in x86_64 assembly

As a gift to the FPC community, I just committed a new Memory Manager for FPC.
Check mormot.core.fpcx64mm.pas in our mORMot2 repository.
This is a stand-alone unit for FPC only.

It targets Windows and Linux multi-threaded Service applications - typically mORMot daemons.
It is written in almost pure x86_64 assembly, and some unique tricks in the Delphi/FPC Memory Manager world.

It is based on FastMM4 (not FastMM5), and we didn't follow the path of the FastMM4-AVX version - instead of AVX, we use plain good (non-temporal) SSE2 opcode, and we rely on the mremap API on Linux for very efficient reallocation. Using mremap is perhaps the biggest  benefit of this memory manager - it leverages a killer feature of the Linux kernel for sure. By the way, we directly call the Kernel without the need of the libc.

We tuned our x86_64 assembly a lot, and made it cross-platform (Windows and POSIX). We profiled the multi-threading, especially by adding some additional small blocks for GetMem (which is a less expensive notion of "arenas" as used in FastMM5 and most C allocators), introducing an innovatice and very efficient round-robin of tiny blocks (<128 bytes), and proper spinning for FreeMem and medium blocks.

It runs all our regression tests with huge performance and stability - including multi-threaded tests with almost no slow down: sleep is reported as less than 1 ms during a 1 minute test. It has also been validated on some demanding multi-threaded tasks.

Continue reading

2018-11-12

EKON 22 Slides and Code

I've uploaded two sets of slides from my presentations at EKON 22 : Object Pascal Clean Code Guidelines Proposal High Performance Object Pascal Code on Servers with the associated source code The WorkShop about "Getting REST with mORMot" has a corresponding new Samples folder in our  […]

Continue reading

2018-02-07

Status of mORMot ORM SOA MVC with FPC

In the last weeks/months, we worked a lot with FPC.
Delphi is still our main IDE, due to its better debugging experience under Windows, but we target to have premium support of FPC, on all platforms, especially Linux.

The new Delphi Linux compiler is out of scope, since it is heavily priced, its performance is not so good, and ARC broke memory management so would need a deep review/rewrite of our source code, which we can't afford - since we have FPC which is, from our opinion,  a much better compiler for Linux.
Of course, you can create clients for Delphi Linux and FMX, as usual, using the cross-platform client parts of mORMot. But for server side, this compiler is not supported, and will probably never be.

Continue reading

2017-08-10

Faster and cross-platform SynLZ

You probably know about our SynLZ compression unit, in pascal and x86 asm, which is very fast for compression with a good compression ratio, and proudly compete with LZ4 or Snappy. It is used in our framework everywhere, e.g. for WebSockets communication, for ECC encrypted file content, or to  […]

Continue reading

2015-06-30

Faster String process using SSE 4.2 Text Processing Instructions STTNI

A lot of our code, and probably yours, is highly relying on text process. In our mORMot framework, most of its features use JSON text, encoded as UTF-8. Profiling shows that a lot of time is spent computing the end of a text buffer, or comparing text content. You may know that In its SSE4.2 feature  […]

Continue reading

2015-06-21

Why FPC may be a better compiler than Delphi

Almost every time I'm debugging some core part of our framework, I like to see the generated asm, and trying to optimize the pascal code for better speed - when it is worth it, of course! I just made a nice observation, when comparing the assembler generated by Delphi to FPC's output. Imagine you  […]

Continue reading

2015-02-21

SynCrypto: SSE4 x64 optimized asm for SHA-256

We have just included some optimized x64 assembler to our Open Source SynCrypto.pas unit so that SHA-256 hashing will perform at best speed.
It is an adaptation from tuned Intel's assembly macros, which makes use of the SSE4 instruction set, if available.

Continue reading

2015-01-15

AES-NI enabled for SynCrypto

Today, we committed a new patch to enable AES-NI hardware acceleration to our SynCrypto.pas unit. Intel® AES-NI is a new encryption instruction set that improves on the Advanced Encryption Standard (AES) algorithm and accelerates the encryption of data on newer processors. Of course, all this is  […]

Continue reading

2014-05-25

New crc32c() function using optimized asm and SSE 4.2 instruction

Cyclic Redundancy Check (CRC) codes are widely used for integrity checking of data in fields such as storage and networking.
There is an ever-increasing need for very high-speed CRC computations on processors for end-to-end integrity checks.

We just introduced to mORMot's core unit (SynCommons.pas) a fast and efficient crc32c() function.

It will use either:

  • Optimized x86 asm code, with unrolled loops;
  • SSE 4.2 hardware crc32 instruction, if available.

Resulting speed is very good.
This is for sure the fastest CRC function available in Delphi.
Note that there is a version dedicated to each Win32 and Win64 platform - both performs at the same speed!

In fact, most popular file formats and protocols (Ethernet, MPEG-2, ZIP, RAR, 7-Zip, GZip, and PNG) use the polynomial $04C11DB7, while Intel's hardware implementation is based on another polynomial, $1EDC6F41 (used in iSCSI and Btrfs).
So you would not use this new crc32c() function to replace the zlib's crc32() function, but as a convenient very fast hashing function at application level.
For instance, our TDynArray wrapper will use it for fast items hashing.

Continue reading

2013-12-05

New Open Source Multi-Thread ready Memory Manager: SAPMM

Do you remember this former article about scalability of the Delphi memory manager, in multi-thread execution context?

Our SynScaleMM is still experimental.
But did pretty well, for an experiment!

At first, you can take a look at ScaleMM2, which is more stable, and based on the same ground.

But a new multi-thread friendly memory manager for Delphi just came out.
It is in fact the anonymous (and already famous) "NN memory manager" Primož talked about in his article about string building and memory managers.


(Note that in this article, our SynScaleMM was found to be scaling very well, but on the other hand, Primož did compile its benchmark program in Debug mode, so our TTextWriter was not in good shape: when you compile in Release mode, optimizations and inlining are ON, and our good TTextWriter just flies... See the note at the beginning of the article - this is why I never find those benchmarks very informative. I always prefer profiling from the real world with real useful process… and was never convinced by any such naive benchmark.)

OK, back to our business!

SapMM is an interesting beast.
https://code.google.com/p/sapmm

Sounds like if Alexei (the initial coder) has a C coding background. But that's fine when you have to deal with low-level structures and algorithms, as required by a memory manager. :)
It features everything we may ask for such a piece of code: clear design, optimized code (mostly by inlining process), memory leak reporting, some parameters for tuning.

It is only for Delphi XE (and up) under Win32 by now, but contributors are welcome!
It is used in production since more than half a year, and it passed all FastcodeMM benchmark tests.

If you want a direct link of the today's source code, without SVN, you may try this direct link from our site.
(but it probably will never be updated - you are warned)

Continue reading

2013-05-21

Performance issue in NextGen ARC model

Apart from being very slow during compilation, the Delphi NextGen compiler introduced a new memory model, named ARC.

We already spoke about ARC years ago, so please refer to our corresponding blog article for further information, especially about how Apple did introduce ARC to iOS instead of the Garbage Collector model.

About how ARC is to be used in the NextGen compiler, take a look at Marco's blog article, and its linked resources.

But the ARC model, as implemented by Embarcadero, has at least one huge performance issue, in the way weak references, and zeroing weak pointers have been implemented.
I do not speak about the general slow down introduced during every class/record initialization/finalization, which is noticeable, but not a big concern.

If you look at XE4 internals, you will discover a disappointing global lock introduced in the RTL.

Continue reading

2013-03-13

x64 optimized asm of FillChar() and Move() for Win64

We have included x64 optimized asm of FillChar() and Move() for Win64 - for corresponding compiler targets, i.e. Delphi XE2 and XE3. It will handle properly cache prefetch and appropriate SSE2 move instructions.  The System.pas unit of Delphi RTL will be patched at startup, unless the NOX64PATCHRTL  […]

Continue reading

2013-03-07

64 bit compatibility of mORMot units

I'm happy to announce that mORMot units are now compiling and working great in 64 bit mode, under Windows.
Need a Delphi XE2/XE3 compiler, of course!

ORM and services are now available in Win64, on both client and server sides.
Low-level x64 assembler stubs have been created, tested and optimized.
UI part is also available... that is grid display, reporting (with pdf export and display anti-aliasing), ribbon auto-generation, SynTaskDialog, i18n... the main SynFile demo just works great!

Overall impression is very positive, and speed is comparable to 32 bit version (only 10-15% slower).

Speed decrease seems to be mostly due to doubled pointer size, and some less optimized part of the official Delphi RTL.
But since mORMot core uses its own set of functions (e.g. for JSON serialization, RTTI support or interface calls or stubbing), we were able to release the whole 64 bit power of your hardware.

Delphi 64 bit compiler sounds stable and efficient. Even when working at low level, with assembler stubs.
Generated code sounds more optimized than the one emitted by FreePascalCompiler - and RTL is very close to 32 bit mode.
Overall, VCL conversion worked as easily than a simple re-build.
Embarcadero's people did a great job for VCL Win64 support, here!

Continue reading

- page 1 of 3