2020-05-07

New Multi-thread Friendly Memory Manager for FPC written in x86_64 assembly

As a gift to the FPC community, I just committed a new Memory Manager for FPC.
Check mormot.core.fpcx64mm.pas in our mORMot2 repository.
This is a stand-alone unit for FPC only.

It targets Windows and Linux multi-threaded Service applications - typically mORMot daemons.
It is written in almost pure x86_64 assembly, and some unique tricks in the Delphi/FPC Memory Manager world.

It is based on FastMM4 (not FastMM5), and we didn't follow the path of the FastMM4-AVX version - instead of AVX, we use plain good (non-temporal) SSE2 opcode, and we rely on the mremap API on Linux for very efficient reallocation. Using mremap is perhaps the biggest  benefit of this memory manager - it leverages a killer feature of the Linux kernel for sure. By the way, we directly call the Kernel without the need of the libc.

We tuned our x86_64 assembly a lot, and made it cross-platform (Windows and POSIX). We profiled the multi-threading, especially by adding some additional small blocks for GetMem (which is a less expensive notion of "arenas" as used in FastMM5 and most C allocators), introducing an innovatice and very efficient round-robin of tiny blocks (<128 bytes), and proper spinning for FreeMem and medium blocks.

It runs all our regression tests with huge performance and stability - including multi-threaded tests with almost no slow down: sleep is reported as less than 1 ms during a 1 minute test. It has also been validated on some demanding multi-threaded tasks.

Continue reading

2018-11-12

EKON 22 Slides and Code

I've uploaded two sets of slides from my presentations at EKON 22 : Object Pascal Clean Code Guidelines Proposal High Performance Object Pascal Code on Servers with the associated source code The WorkShop about "Getting REST with mORMot" has a corresponding new Samples folder in our  […]

Continue reading

2018-02-07

Status of mORMot ORM SOA MVC with FPC

In the last weeks/months, we worked a lot with FPC.
Delphi is still our main IDE, due to its better debugging experience under Windows, but we target to have premium support of FPC, on all platforms, especially Linux.

The new Delphi Linux compiler is out of scope, since it is heavily priced, its performance is not so good, and ARC broke memory management so would need a deep review/rewrite of our source code, which we can't afford - since we have FPC which is, from our opinion,  a much better compiler for Linux.
Of course, you can create clients for Delphi Linux and FMX, as usual, using the cross-platform client parts of mORMot. But for server side, this compiler is not supported, and will probably never be.

Continue reading

2017-08-10

Faster and cross-platform SynLZ

You probably know about our SynLZ compression unit, in pascal and x86 asm, which is very fast for compression with a good compression ratio, and proudly compete with LZ4 or Snappy. It is used in our framework everywhere, e.g. for WebSockets communication, for ECC encrypted file content, or to  […]

Continue reading

2015-06-30

Faster String process using SSE 4.2 Text Processing Instructions STTNI

A lot of our code, and probably yours, is highly relying on text process. In our mORMot framework, most of its features use JSON text, encoded as UTF-8. Profiling shows that a lot of time is spent computing the end of a text buffer, or comparing text content. You may know that In its SSE4.2 feature  […]

Continue reading

2015-06-21

Why FPC may be a better compiler than Delphi

Almost every time I'm debugging some core part of our framework, I like to see the generated asm, and trying to optimize the pascal code for better speed - when it is worth it, of course! I just made a nice observation, when comparing the assembler generated by Delphi to FPC's output. Imagine you  […]

Continue reading

2015-02-21

SynCrypto: SSE4 x64 optimized asm for SHA-256

We have just included some optimized x64 assembler to our Open Source SynCrypto.pas unit so that SHA-256 hashing will perform at best speed.
It is an adaptation from tuned Intel's assembly macros, which makes use of the SSE4 instruction set, if available.

Continue reading

2015-01-15

AES-NI enabled for SynCrypto

Today, we committed a new patch to enable AES-NI hardware acceleration to our SynCrypto.pas unit. Intel® AES-NI is a new encryption instruction set that improves on the Advanced Encryption Standard (AES) algorithm and accelerates the encryption of data on newer processors. Of course, all this is  […]

Continue reading

2014-05-25

New crc32c() function using optimized asm and SSE 4.2 instruction

Cyclic Redundancy Check (CRC) codes are widely used for integrity checking of data in fields such as storage and networking.
There is an ever-increasing need for very high-speed CRC computations on processors for end-to-end integrity checks.

We just introduced to mORMot's core unit (SynCommons.pas) a fast and efficient crc32c() function.

It will use either:

  • Optimized x86 asm code, with unrolled loops;
  • SSE 4.2 hardware crc32 instruction, if available.

Resulting speed is very good.
This is for sure the fastest CRC function available in Delphi.
Note that there is a version dedicated to each Win32 and Win64 platform - both performs at the same speed!

In fact, most popular file formats and protocols (Ethernet, MPEG-2, ZIP, RAR, 7-Zip, GZip, and PNG) use the polynomial $04C11DB7, while Intel's hardware implementation is based on another polynomial, $1EDC6F41 (used in iSCSI and Btrfs).
So you would not use this new crc32c() function to replace the zlib's crc32() function, but as a convenient very fast hashing function at application level.
For instance, our TDynArray wrapper will use it for fast items hashing.

Continue reading

2013-12-05

New Open Source Multi-Thread ready Memory Manager: SAPMM

Do you remember this former article about scalability of the Delphi memory manager, in multi-thread execution context?

Our SynScaleMM is still experimental.
But did pretty well, for an experiment!

At first, you can take a look at ScaleMM2, which is more stable, and based on the same ground.

But a new multi-thread friendly memory manager for Delphi just came out.
It is in fact the anonymous (and already famous) "NN memory manager" Primož talked about in his article about string building and memory managers.


(Note that in this article, our SynScaleMM was found to be scaling very well, but on the other hand, Primož did compile its benchmark program in Debug mode, so our TTextWriter was not in good shape: when you compile in Release mode, optimizations and inlining are ON, and our good TTextWriter just flies... See the note at the beginning of the article - this is why I never find those benchmarks very informative. I always prefer profiling from the real world with real useful process… and was never convinced by any such naive benchmark.)

OK, back to our business!

SapMM is an interesting beast.
https://code.google.com/p/sapmm

Sounds like if Alexei (the initial coder) has a C coding background. But that's fine when you have to deal with low-level structures and algorithms, as required by a memory manager. :)
It features everything we may ask for such a piece of code: clear design, optimized code (mostly by inlining process), memory leak reporting, some parameters for tuning.

It is only for Delphi XE (and up) under Win32 by now, but contributors are welcome!
It is used in production since more than half a year, and it passed all FastcodeMM benchmark tests.

If you want a direct link of the today's source code, without SVN, you may try this direct link from our site.
(but it probably will never be updated - you are warned)

Continue reading

2013-05-21

Performance issue in NextGen ARC model

Apart from being very slow during compilation, the Delphi NextGen compiler introduced a new memory model, named ARC.

We already spoke about ARC years ago, so please refer to our corresponding blog article for further information, especially about how Apple did introduce ARC to iOS instead of the Garbage Collector model.

About how ARC is to be used in the NextGen compiler, take a look at Marco's blog article, and its linked resources.

But the ARC model, as implemented by Embarcadero, has at least one huge performance issue, in the way weak references, and zeroing weak pointers have been implemented.
I do not speak about the general slow down introduced during every class/record initialization/finalization, which is noticeable, but not a big concern.

If you look at XE4 internals, you will discover a disappointing global lock introduced in the RTL.

Continue reading

2013-03-13

x64 optimized asm of FillChar() and Move() for Win64

We have included x64 optimized asm of FillChar() and Move() for Win64 - for corresponding compiler targets, i.e. Delphi XE2 and XE3. It will handle properly cache prefetch and appropriate SSE2 move instructions.  The System.pas unit of Delphi RTL will be patched at startup, unless the NOX64PATCHRTL  […]

Continue reading

2013-03-07

64 bit compatibility of mORMot units

I'm happy to announce that mORMot units are now compiling and working great in 64 bit mode, under Windows.
Need a Delphi XE2/XE3 compiler, of course!

ORM and services are now available in Win64, on both client and server sides.
Low-level x64 assembler stubs have been created, tested and optimized.
UI part is also available... that is grid display, reporting (with pdf export and display anti-aliasing), ribbon auto-generation, SynTaskDialog, i18n... the main SynFile demo just works great!

Overall impression is very positive, and speed is comparable to 32 bit version (only 10-15% slower).

Speed decrease seems to be mostly due to doubled pointer size, and some less optimized part of the official Delphi RTL.
But since mORMot core uses its own set of functions (e.g. for JSON serialization, RTTI support or interface calls or stubbing), we were able to release the whole 64 bit power of your hardware.

Delphi 64 bit compiler sounds stable and efficient. Even when working at low level, with assembler stubs.
Generated code sounds more optimized than the one emitted by FreePascalCompiler - and RTL is very close to 32 bit mode.
Overall, VCL conversion worked as easily than a simple re-build.
Embarcadero's people did a great job for VCL Win64 support, here!

Continue reading

2012-12-20

How to make it fast?

On our forum, a clever question was posted about publishing some enhanced RTL functions for newer versions of Delphi - as we did for Delphi 7 and 2007.

I was looking for a faster IntToStr implementation and discovered SynCommons.pas.
(....)
That's really too bad, SynCommons.pas really does contain some seriously fast stuff, people would greatly benefit from it if it was made general-purpose.

In fact, it would not be enough to change the RTL function implementations.
IMHO, to write something scalable, you need to get rid of such functions.

Continue reading

2012-11-13

Go language and Delphi

Do you know the Go language?

It is a strong-typed, compiled, cross-platform, and concurrent.
It features some nice high-level structures, like maps and strings, and still have very low-level access to the generated code: pointers are there, in a safe strong-typed implementation just like in pascal, and there is even a "goto", which sounds like an heresy to dogmatic coders, but does make sense to me, at least when you want to optimize code speed, in some rare cases.

It is created/pushed by Google, used internally by the company in their computer farms, and was designed by one of the original C creators.

Continue reading

2012-04-10

How function results are allocated

One potential issue with Delphi coding, is about how the result of a functions are implemented.

If you forget to set a result value to a function, you'll get a compiler warning.
Never underestimate such warning: IMHO this is not a warning, but an error.

And you should better be aware of the handling of reference-counted types (e.g. string) in a function results: those are passed the stack as var parameters, so the result of a function may be set even if an exception is raised during function execution!

Continue reading

2011-11-08

Currency is your friend

The currency type is the standard Delphi type to be used when storing and handling monetary values. It will avoid any rounding problems, with 4 decimals precision. It is able to safely store numbers in the range -922337203685477.5808 .. 922337203685477.5807. Should be enough for your pocket change.

As stated by the official Delphi documentation:

Currency is a fixed-point data type that minimizes rounding errors in monetary calculations. On the Win32 platform, it is stored as a scaled 64-bit integer with the four least significant digits implicitly representing decimal places. When mixed with other real types in assignments and expressions, Currency values are automatically divided or multiplied by 10000.

In fact, this type matches the corresponding OLE and .Net implementation of currency, and the one used by most database providers (when it comes to money, a dedicated type is worth the cost in a "rich man's world"). It is still implemented the same in the Win64 platform (since XE 2). The Int64 binary representation of the currency type (i.e. value*10000 as accessible via PInt64(aCurrencyValue)^) is a safe and fast implementation pattern.

In our framework, we tried to avoid any unnecessary conversion to float values when dealing with currency values. Some dedicated functions have been implemented for fast and secure access to currency published properties via RTTI, especially when converting values to or from JSON text. Using the Int64 binary representation can be not only faster, but also safer: you will avoid any rounding problem which may be introduced by the conversion to a float type. Rounding issues are a nightmare to track - it sounds safe to have a framework handling natively a currency type from the ground up.

Continue reading

2011-09-12

Using Extended in Delphi XE2 64 bit

Unfortunately, Delphi's 64-bit compiler (dcc64) and RTL do not support 80-bit extended floating point values on Win64, but silently alias Extended = Double on Win64.

There are situations, however, where this is clearly undesirable, e.g. if the additional precision gained from Extended is required.

The Open-source uTExtendedX87 unit provides a replacement FPU-backed 80-bit Extended floating point type (TExtendedX87) for Win64.

Continue reading

2011-08-28

Multi-threading and Delphi

Writing working multi-threaded code is not easy - it's even hard, as as a Delphi expert just wrote in his blog.

In fact, the first step into multi-thread application development could be:

"protect your shared variables with locks (aka critical sections), because you are not sure that the data you read/write is the same for all threads".

The CPU per-core cache is just one of the possible issues, which will lead into reading wrong values. Another issue which may lead into race condition is two threads writing to a resource at the same time: it's impossible to know which value will be stored afterward.

Continue reading

2011-08-08

Our mORMot won't hibernate this winter, thanks to FireMonkey

Everybody is buzzing about FireMonkey...

Our little mORMot will like FireMonkey!
Here is why...

Continue reading

- page 1 of 2