Tag - 64bit

Entries feed - Comments feed

2022-02-15

mORMot 2 ORM Performance

The official release of mORMot 2 is around the edge. It may be the occasion to show some data persistence performance numbers, in respect to mORMot 1.

For the version 2 of our framework, its ORM feature has been enhanced and tuned in several aspects: REST routing optimization, ORM/JSON serialization, and in-memory and SQL engines tuning. Numbers are talking. You could compare with any other solution, and compile and run the tests by yourself for both framework, and see how it goes on your own computer or server.
In a nutshell, we almost reach 1 million inserts per second on SQLite3, and are above the million inserts in our in-memory engine. Reading speed is 1.2 million and 1.7 million respectively. From the object to the storage, and back. And forcing AES-CTR encryption on disk almost don't change anything. Now we are talking. ;)

Continue reading

2021-11-16

EKON 25 Slides

EKON 25 at Düsseldorf was a great conference (konference?).

At last, a physical gathering of Delphi developers, mostly from Germany, but also from Europe - and even some from USA! No more virtual meetings, which may trigger the well known 'Abstract Error' on modern pascal coders.
There were some happy FPC users too - as I am now. :)

I have published the slides of my conferences, mostly about mORMot 2.
By the way, I wish we would be able to release officially mORMot 2 in December, before Christmas. I think it starts to be stabilized and already known to be used on production. We expect no more breaking change in the next weeks.

Continue reading

2021-08-17

mORMot 2 on Ampere AARM64 CPU

Last weeks, we have enhanced mORMot support to one of the more powerful AARM64 CPU available: the Ampere Altra CPU, as made available on the Oracle Cloud Infrastructure.

Long story short, this is an amazing hardware to run on server side, with performance close to what Intel/AMD offers, but with almost linear multi-core scalability. The FPC compiler is able to run good code on it, and our mORMot 2 library is able to use the hardware accelerated opcodes for AES, SHA2, and crc32/crc32c.

Continue reading

2021-05-08

Enhanced Faster ZIP Support in mORMot 2

The .zip format is from last century, back to the early DOS days, but can still be found everywhere. It is even hidden when you run a .docx document, a .jar application, or any Android app!
It is therefore (ab)used not only as archive format, but as application file format / container - even if in this respect using SQLite3 may have much more sense.

We recently enhanced our mormot.core.zip.pas unit:

  • to support Zip64,
  • with enhanced .zip read/write,
  • to have a huge performance boost during its process,
  • and to integrate better with signed executables.

Continue reading

2021-02-13

Fastest AES-PRNG, AES-CTR and AES-GCM Delphi implementation

Last week, I committed new ASM implementations of our AES-PRNG, AES-CTR and AES-GCM for mORMot 2.
They handle eight 128-bit at once in an interleaved fashion, as permitted by the CTR chaining mode. The aes-ni opcodes (aesenc aesenclast) are used for AES process, and the GMAC of the AES-GCM mode is computed using the pclmulqdq opcode.

Resulting performance is amazing: on my simple Core i3, I reach 2.6 GB/s for aes-128-ctr, and 1.5 GB/s for aes-128-gcm for instance - the first being actually faster than OpenSSL!

Continue reading

2021-02-12

New AesNiHash for mORMot 2

I have just committed some new AesNiHash32 AesNiHash64 AesNiHash128 Hashers for mORMot 2. They are using AES-NI and SSE4.1 opcodes on x86_64 and i386. This implementation is faster than the fastest SSE4.1 crc32c and with a much higher usability (less collisions). Logic was extracted from the Go  […]

Continue reading

2020-05-07

New Multi-thread Friendly Memory Manager for FPC written in x86_64 assembly

As a gift to the FPC community, I just committed a new Memory Manager for FPC.
Check mormot.core.fpcx64mm.pas in our mORMot2 repository.
This is a stand-alone unit for FPC only.

It targets Windows and Linux multi-threaded Service applications - typically mORMot daemons.
It is written in almost pure x86_64 assembly, and some unique tricks in the Delphi/FPC Memory Manager world.

It is based on FastMM4 (not FastMM5), and we didn't follow the path of the FastMM4-AVX version - instead of AVX, we use plain good (non-temporal) SSE2 opcode, and we rely on the mremap API on Linux for very efficient reallocation. Using mremap is perhaps the biggest  benefit of this memory manager - it leverages a killer feature of the Linux kernel for sure. By the way, we directly call the Kernel without the need of the libc.

We tuned our x86_64 assembly a lot, and made it cross-platform (Windows and POSIX). We profiled the multi-threading, especially by adding some additional small blocks for GetMem (which is a less expensive notion of "arenas" as used in FastMM5 and most C allocators), introducing an innovatice and very efficient round-robin of tiny blocks (<128 bytes), and proper spinning for FreeMem and medium blocks.

It runs all our regression tests with huge performance and stability - including multi-threaded tests with almost no slow down: sleep is reported as less than 1 ms during a 1 minute test. It has also been validated on some demanding multi-threaded tasks.

Continue reading

2020-03-28

Faster Double-To-Text Conversion

On server side, a lot of CPU is done processing conversions to or from text. Mainly JSON these days.

In mORMot, we take care a lot about performance, so we have rewritten most conversion functions to have something faster than the Delphi or FPC RTL can offer.
Only float to text conversion was not available. And RTL str/floattexttext performance, at least under Delphi, is not consistent among platforms.
So we just added a new Double-To-Text set of functions.

Continue reading

2020-02-17

New move/fillchar optimized sse2/avx asm version

Our Open Source framework includes some optimized asm alternatives to RTL's move() and fillchar(), named MoveFast() and FillCharFast().

We just rewrote from scratch the x86_64 version of those, which was previously taken from third-party snippets.
The brand new code is meant to be more efficient and maintainable. In particular, we switched to SIMD 128-bit SSE2 or 256bit AVX memory access (if available), whereas current version was using 64-bit regular registers. The small blocks (i.e. < 32 bytes) process occurs very often, e.g. when processing strings, so has been tuned a lot. Non temporal instructions (i.e. bypassing the CPU cache) are used for biggest chunks of data. We tested ERMS support, but it was found of no benefit in respect to our optimized SIMD, and was actually slower than our non-temporal variants. So ERMS code is currently disabled in the source, and may be enabled on demand by a conditional.

FPC move() was not bad. Delphi's Win64 was far from optimized - even ERMS was poorly introduced in latest RTL, since it should be triggered only for blocks > 2KB. Sadly, Delphi doesn't support AVX assembly yet, so those opcodes would be available only on FPC.

Resulting numbers are talking by themselves. Working on Win64 and Linux, of course.

Continue reading

2019-09-21

SQLite3 static linking for Delphi Win64

A long-awaited feature was the ability to create stand-alone mORMot Win64 applications via Delphi, with no external sqlite3-64.dll required.

It is now available, with proper integration, and encryption is working!

Continue reading

2018-03-12

New AES-based SQLite3 encryption

We just committed a deep refactoring of the SynSQlite3Static.pas unit - and all units using static linking for FPC. It also includes a new encryption format for SQlite3, using AES, so much more secure than the previous one. This is a breaking change, so worth a blog article! Now all static .o .a  […]

Continue reading

2018-02-07

Status of mORMot ORM SOA MVC with FPC

In the last weeks/months, we worked a lot with FPC.
Delphi is still our main IDE, due to its better debugging experience under Windows, but we target to have premium support of FPC, on all platforms, especially Linux.

The new Delphi Linux compiler is out of scope, since it is heavily priced, its performance is not so good, and ARC broke memory management so would need a deep review/rewrite of our source code, which we can't afford - since we have FPC which is, from our opinion,  a much better compiler for Linux.
Of course, you can create clients for Delphi Linux and FMX, as usual, using the cross-platform client parts of mORMot. But for server side, this compiler is not supported, and will probably never be.

Continue reading

2017-08-10

Faster and cross-platform SynLZ

You probably know about our SynLZ compression unit, in pascal and x86 asm, which is very fast for compression with a good compression ratio, and proudly compete with LZ4 or Snappy. It is used in our framework everywhere, e.g. for WebSockets communication, for ECC encrypted file content, or to  […]

Continue reading

2017-03-22

Delphi 10.2 Tokyo Compatibility: DCC64 broken

We are proud to announce compatibility of our mORMot Open Source framework with the latest Delphi 10.2 Tokyo compiler...
At least for Win32.

For Win64, the compiler was stuck at the end of the compilation, burning 100% of one CPU core...

A bit disappointing, isn't it?

Continue reading

2016-02-08

Linux support for Delphi to be available end of 2016

Marco Cantu, product manager of Delphi/RAD Studio, did publish the official RAD Studio 2016 Product Approach and Roadmap.
The upcoming release has a codename known as "BigBen", and should be called Delphi 10.1 Berlin, as far as I understand.

After this summer, another release, which codename is "Godzilla", will support Linux as a compiler target, in its Delphi 10.2 Tokyo release.
This is a very good news, and some details are given.
I've included those official names to mORMot's internal compiler version detection.
Thanks Marco for the information, and pushing in this direction!

My only concern is that it would be "ARC-enabled"...

Continue reading

2015-10-05

Delphi 10 Seattle Win64 compiler Heisenbug: unusable target

Andy reported that he was not able to validate its IDE fix pack for Delphi 10 Seattle, due to its Win64 compiler not being deterministic anymore. The generated code did vary, from one build to other. Sadly, on our side, we identified that the code generated by the Win64 compiler of Delphi 10 Seattle  […]

Continue reading

2015-02-21

SynCrypto: SSE4 x64 optimized asm for SHA-256

We have just included some optimized x64 assembler to our Open Source SynCrypto.pas unit so that SHA-256 hashing will perform at best speed.
It is an adaptation from tuned Intel's assembly macros, which makes use of the SSE4 instruction set, if available.

Continue reading

2014-11-14

BREAKING CHANGE - TSQLRecord.ID primary key changed to TID: Int64

Up to now, the TSQLRecord.ID property was defined in mORMot.pas as a plain PtrInt/NativeInt (i.e. Integer under Win32), since it was type-cast as pointer for TSQLRecord published properties.
We introduced a new TID type, so that the ORM primary key would now be defined as Int64.

All the framework ORM process relies on the TSQLRecord class.
This abstract TSQLRecord class features a lot of built-in methods, convenient to do most of the ORM process in a generic way, at record level.

It first defines a primary key field, defined as ID: TID, i.e. as Int64 in mORMot.pas:

type
  TID = type Int64;
  ...
  TSQLRecord = class(TObject)
  ...
    property ID: TID read GetID write fID;
  ...

In fact, our ORM relies now on a Int64 primary key, matching the SQLite3 ID/RowID primary key.
This primary key will be used as RESTful resource identifier, for all CRUD operations.

Continue reading

2014-06-30

Sub-optimized Win64 Delphi compiler: missing branch table for case of

As we already stated here, the Delphi compiler for the Win64 target performs well, as soon as you by-pass the RTL and its sub-optimized implementation - as we do for mORMot.
In fact, our huge set of regression tests perform only 10% slower on Win64, when compared to Win32.
But we got access to much more memory - which is not a huge gain for a mORMot server, which uses very little of RAM - so may be useful in some cases, when you need a lot of structures to be loaded in your RAM.

Slowdown on Win64 is mostly due to biggest pointer size, which will use twice the memory, hence may generate a larger number of cache misses (failed attempts to read or write a piece of data in the cache, which results in a main memory access with much longer latency).
But in Delphi, apart from the RTL which may need more tuning about performance (but seems not to be a priority on Embarcadero side), is also sometimes less efficient when generating the code.
For instance, sounds like if case ... of ... end statements do not generated branch table instructions on Win64, whereas it does for Win32 - and FPC does for any x64 platform it supports.

Continue reading

2014-06-09

Performance comparison from Delphi 6, 7, 2007, XE4 and XE6

Since there was recently some articles about performance comparison between several versions of the Delphi compiler, we had to react, and gives our personal point of view.

IMHO there won't be any definitive statement about this.
I'm always doubtful about any conclusion which may be achieved with such kind of benchmarks.
Asking "which compiler is better?" is IMHO a wrong question.
As if there was some "compiler magic": the new compiler will be just like a new laundry detergent - it will be cleaner and whiter...

Performance is not about marketing.
Performance is an iterative process, always a matter of circumstances, and implementation.

Circumstances of the benchmark itself.
Each benchmark will report only information about the process it measured.
What you compare is a limited set of features, running most of the time an idealized and simplified pattern, which shares nothing with real-world process.

Implementation is what gives performance.
Changing a compiler will only gives you some percents of time change.
Identifying the true bottlenecks of an application via a profiler, then changing the implementation of the identified bottlenecks may give order of magnitudes of speed improvement.
For instance, multi-threading abilities can be achieved by following some simple rules.

With our huge set of regression tests, we have at hand more than 16,500,000 individual checks, covering low-level features (like numerical and text marshaling), or high-level process (like concurrent client/server and database multi-threaded process).

You will find here some benchmarks run with Delphi 6, 7, 2007, XE4 and XE6 under Win32, and XE4 and XE6 under Win64.
In short, all compilers performs more or less at the same speed.
Win64 is a little slower than Win32, and the fastest appears to be Delphi 7, using our enhanced and optimized RTL.

Continue reading

- page 1 of 2