Tag - 64bit

Entries feed - Comments feed


Fastest AES-PRNG, AES-CTR and AES-GCM Delphi implementation

Last week, I committed new ASM implementations of our AES-PRNG, AES-CTR and AES-GCM for mORMot 2.
They handle eight 128-bit at once in an interleaved fashion, as permitted by the CTR chaining mode. The aes-ni opcodes (aesenc aesenclast) are used for AES process, and the GMAC of the AES-GCM mode is computed using the pclmulqdq opcode.

Resulting performance is amazing: on my simple Core i3, I reach 2.6 GB/s for aes-128-ctr, and 1.5 GB/s for aes-128-gcm for instance - the first being actually faster than OpenSSL!

Continue reading


New AesNiHash for mORMot 2

I have just committed some new AesNiHash32 AesNiHash64 AesNiHash128 Hashers for mORMot 2. They are using AES-NI and SSE4.1 opcodes on x86_64 and i386. This implementation is faster than the fastest SSE4.1 crc32c and with a much higher usability (less collisions). Logic was extracted from the Go  […]

Continue reading


New Multi-thread Friendly Memory Manager for FPC written in x86_64 assembly

As a gift to the FPC community, I just committed a new Memory Manager for FPC.
Check mormot.core.fpcx64mm.pas in our mORMot2 repository.
This is a stand-alone unit for FPC only.

It targets Windows and Linux multi-threaded Service applications - typically mORMot daemons.
It is written in almost pure x86_64 assembly, and some unique tricks in the Delphi/FPC Memory Manager world.

It is based on FastMM4 (not FastMM5), and we didn't follow the path of the FastMM4-AVX version - instead of AVX, we use plain good (non-temporal) SSE2 opcode, and we rely on the mremap API on Linux for very efficient reallocation. Using mremap is perhaps the biggest  benefit of this memory manager - it leverages a killer feature of the Linux kernel for sure. By the way, we directly call the Kernel without the need of the libc.

We tuned our x86_64 assembly a lot, and made it cross-platform (Windows and POSIX). We profiled the multi-threading, especially by adding some additional small blocks for GetMem (which is a less expensive notion of "arenas" as used in FastMM5 and most C allocators), introducing an innovatice and very efficient round-robin of tiny blocks (<128 bytes), and proper spinning for FreeMem and medium blocks.

It runs all our regression tests with huge performance and stability - including multi-threaded tests with almost no slow down: sleep is reported as less than 1 ms during a 1 minute test. It has also been validated on some demanding multi-threaded tasks.

Continue reading


Faster Double-To-Text Conversion

On server side, a lot of CPU is done processing conversions to or from text. Mainly JSON these days.

In mORMot, we take care a lot about performance, so we have rewritten most conversion functions to have something faster than the Delphi or FPC RTL can offer.
Only float to text conversion was not available. And RTL str/floattexttext performance, at least under Delphi, is not consistent among platforms.
So we just added a new Double-To-Text set of functions.

Continue reading


New move/fillchar optimized sse2/avx asm version

Our Open Source framework includes some optimized asm alternatives to RTL's move() and fillchar(), named MoveFast() and FillCharFast().

We just rewrote from scratch the x86_64 version of those, which was previously taken from third-party snippets.
The brand new code is meant to be more efficient and maintainable. In particular, we switched to SIMD 128-bit SSE2 or 256bit AVX memory access (if available), whereas current version was using 64-bit regular registers. The small blocks (i.e. < 32 bytes) process occurs very often, e.g. when processing strings, so has been tuned a lot. Non temporal instructions (i.e. bypassing the CPU cache) are used for biggest chunks of data. We tested ERMS support, but it was found of no benefit in respect to our optimized SIMD, and was actually slower than our non-temporal variants. So ERMS code is currently disabled in the source, and may be enabled on demand by a conditional.

FPC move() was not bad. Delphi's Win64 was far from optimized - even ERMS was poorly introduced in latest RTL, since it should be triggered only for blocks > 2KB. Sadly, Delphi doesn't support AVX assembly yet, so those opcodes would be available only on FPC.

Resulting numbers are talking by themselves. Working on Win64 and Linux, of course.

Continue reading


SQLite3 static linking for Delphi Win64

A long-awaited feature was the ability to create stand-alone mORMot Win64 applications via Delphi, with no external sqlite3-64.dll required.

It is now available, with proper integration, and encryption is working!

Continue reading


New AES-based SQLite3 encryption

We just committed a deep refactoring of the SynSQlite3Static.pas unit - and all units using static linking for FPC. It also includes a new encryption format for SQlite3, using AES, so much more secure than the previous one. This is a breaking change, so worth a blog article! Now all static .o .a  […]

Continue reading


Status of mORMot ORM SOA MVC with FPC

In the last weeks/months, we worked a lot with FPC.
Delphi is still our main IDE, due to its better debugging experience under Windows, but we target to have premium support of FPC, on all platforms, especially Linux.

The new Delphi Linux compiler is out of scope, since it is heavily priced, its performance is not so good, and ARC broke memory management so would need a deep review/rewrite of our source code, which we can't afford - since we have FPC which is, from our opinion,  a much better compiler for Linux.
Of course, you can create clients for Delphi Linux and FMX, as usual, using the cross-platform client parts of mORMot. But for server side, this compiler is not supported, and will probably never be.

Continue reading


Faster and cross-platform SynLZ

You probably know about our SynLZ compression unit, in pascal and x86 asm, which is very fast for compression with a good compression ratio, and proudly compete with LZ4 or Snappy. It is used in our framework everywhere, e.g. for WebSockets communication, for ECC encrypted file content, or to  […]

Continue reading


Delphi 10.2 Tokyo Compatibility: DCC64 broken

We are proud to announce compatibility of our mORMot Open Source framework with the latest Delphi 10.2 Tokyo compiler...
At least for Win32.

For Win64, the compiler was stuck at the end of the compilation, burning 100% of one CPU core...

A bit disappointing, isn't it?

Continue reading


Linux support for Delphi to be available end of 2016

Marco Cantu, product manager of Delphi/RAD Studio, did publish the official RAD Studio 2016 Product Approach and Roadmap.
The upcoming release has a codename known as "BigBen", and should be called Delphi 10.1 Berlin, as far as I understand.

After this summer, another release, which codename is "Godzilla", will support Linux as a compiler target, in its Delphi 10.2 Tokyo release.
This is a very good news, and some details are given.
I've included those official names to mORMot's internal compiler version detection.
Thanks Marco for the information, and pushing in this direction!

My only concern is that it would be "ARC-enabled"...

Continue reading


Delphi 10 Seattle Win64 compiler Heisenbug: unusable target

Andy reported that he was not able to validate its IDE fix pack for Delphi 10 Seattle, due to its Win64 compiler not being deterministic anymore. The generated code did vary, from one build to other. Sadly, on our side, we identified that the code generated by the Win64 compiler of Delphi 10 Seattle  […]

Continue reading


SynCrypto: SSE4 x64 optimized asm for SHA-256

We have just included some optimized x64 assembler to our Open Source SynCrypto.pas unit so that SHA-256 hashing will perform at best speed.
It is an adaptation from tuned Intel's assembly macros, which makes use of the SSE4 instruction set, if available.

Continue reading


BREAKING CHANGE - TSQLRecord.ID primary key changed to TID: Int64

Up to now, the TSQLRecord.ID property was defined in mORMot.pas as a plain PtrInt/NativeInt (i.e. Integer under Win32), since it was type-cast as pointer for TSQLRecord published properties.
We introduced a new TID type, so that the ORM primary key would now be defined as Int64.

All the framework ORM process relies on the TSQLRecord class.
This abstract TSQLRecord class features a lot of built-in methods, convenient to do most of the ORM process in a generic way, at record level.

It first defines a primary key field, defined as ID: TID, i.e. as Int64 in mORMot.pas:

  TID = type Int64;
  TSQLRecord = class(TObject)
    property ID: TID read GetID write fID;

In fact, our ORM relies now on a Int64 primary key, matching the SQLite3 ID/RowID primary key.
This primary key will be used as RESTful resource identifier, for all CRUD operations.

Continue reading


Sub-optimized Win64 Delphi compiler: missing branch table for case of

As we already stated here, the Delphi compiler for the Win64 target performs well, as soon as you by-pass the RTL and its sub-optimized implementation - as we do for mORMot.
In fact, our huge set of regression tests perform only 10% slower on Win64, when compared to Win32.
But we got access to much more memory - which is not a huge gain for a mORMot server, which uses very little of RAM - so may be useful in some cases, when you need a lot of structures to be loaded in your RAM.

Slowdown on Win64 is mostly due to biggest pointer size, which will use twice the memory, hence may generate a larger number of cache misses (failed attempts to read or write a piece of data in the cache, which results in a main memory access with much longer latency).
But in Delphi, apart from the RTL which may need more tuning about performance (but seems not to be a priority on Embarcadero side), is also sometimes less efficient when generating the code.
For instance, sounds like if case ... of ... end statements do not generated branch table instructions on Win64, whereas it does for Win32 - and FPC does for any x64 platform it supports.

Continue reading


Performance comparison from Delphi 6, 7, 2007, XE4 and XE6

Since there was recently some articles about performance comparison between several versions of the Delphi compiler, we had to react, and gives our personal point of view.

IMHO there won't be any definitive statement about this.
I'm always doubtful about any conclusion which may be achieved with such kind of benchmarks.
Asking "which compiler is better?" is IMHO a wrong question.
As if there was some "compiler magic": the new compiler will be just like a new laundry detergent - it will be cleaner and whiter...

Performance is not about marketing.
Performance is an iterative process, always a matter of circumstances, and implementation.

Circumstances of the benchmark itself.
Each benchmark will report only information about the process it measured.
What you compare is a limited set of features, running most of the time an idealized and simplified pattern, which shares nothing with real-world process.

Implementation is what gives performance.
Changing a compiler will only gives you some percents of time change.
Identifying the true bottlenecks of an application via a profiler, then changing the implementation of the identified bottlenecks may give order of magnitudes of speed improvement.
For instance, multi-threading abilities can be achieved by following some simple rules.

With our huge set of regression tests, we have at hand more than 16,500,000 individual checks, covering low-level features (like numerical and text marshaling), or high-level process (like concurrent client/server and database multi-threaded process).

You will find here some benchmarks run with Delphi 6, 7, 2007, XE4 and XE6 under Win32, and XE4 and XE6 under Win64.
In short, all compilers performs more or less at the same speed.
Win64 is a little slower than Win32, and the fastest appears to be Delphi 7, using our enhanced and optimized RTL.

Continue reading


New crc32c() function using optimized asm and SSE 4.2 instruction

Cyclic Redundancy Check (CRC) codes are widely used for integrity checking of data in fields such as storage and networking.
There is an ever-increasing need for very high-speed CRC computations on processors for end-to-end integrity checks.

We just introduced to mORMot's core unit (SynCommons.pas) a fast and efficient crc32c() function.

It will use either:

  • Optimized x86 asm code, with unrolled loops;
  • SSE 4.2 hardware crc32 instruction, if available.

Resulting speed is very good.
This is for sure the fastest CRC function available in Delphi.
Note that there is a version dedicated to each Win32 and Win64 platform - both performs at the same speed!

In fact, most popular file formats and protocols (Ethernet, MPEG-2, ZIP, RAR, 7-Zip, GZip, and PNG) use the polynomial $04C11DB7, while Intel's hardware implementation is based on another polynomial, $1EDC6F41 (used in iSCSI and Btrfs).
So you would not use this new crc32c() function to replace the zlib's crc32() function, but as a convenient very fast hashing function at application level.
For instance, our TDynArray wrapper will use it for fast items hashing.

Continue reading


Delphi XE4 NextGen compiler: using byte instead of ansichar?

When I first read the technical white paper covering all of the language changes in XE4 for mobile development (tied to the new ARM LLVM-based Delphi compiler), I have to confess I was pretty much confused.

Two great mORMot users just asked for XE4/iOS support of mORMot.

Win32/Win64 support for XE4 will be done as soon as we got a copy of it.
I suspect the code already works, since it was working as expected with XE3, and we rely on our own set of low-level functions for most internal work.

But iOS-targetting is more complex, due to the NextGen compiler, mainly.

Continue reading


Download latest version of sqlite3.dll for Windows 64 bit

Update: We now build the amalgamation file with mingw and release the latest version of SQLite3, from this direct SQLite3-64.7z link, as soon as it is published on the SQLite3 site. Up to now, there is no official Win64 version of the SQlite3 library released in http://sqlite.org.. It is in fact  […]

Continue reading


x64 optimized asm of FillChar() and Move() for Win64

We have included x64 optimized asm of FillChar() and Move() for Win64 - for corresponding compiler targets, i.e. Delphi XE2 and XE3. It will handle properly cache prefetch and appropriate SSE2 move instructions.  The System.pas unit of Delphi RTL will be patched at startup, unless the NOX64PATCHRTL  […]

Continue reading

- page 1 of 2