To content | To menu | To search

Tag - performance

Entries feed

2016, Saturday April 9

AES-256 based Cryptographically Secure Pseudo-Random Number Generator (CSPRNG)

Everyone knows about the pascal random() function.
It returns some numbers, using a linear congruential generator, with a multiplier of 134775813, in its Delphi implementation.
It is fast, but not really secure. Output is very predictable, especially if you forgot to execute the RandSeed() procedure.

In real world scenarios, safety always requires random numbers, e.g. for key/nonce/IV/salt/challenge generation.
The less predictable, the better.
We just included a Cryptographically Secure Pseudo-Random Number Generator (CSPRNG) into our SynCrypto.pas unit.
The TAESPRNG class would use real system entropy to generate a sequence of pseudorandom bytes, using AES-256, so returning highly unpredictable content.

Continue reading...

2015, Tuesday November 17

Benefits of interface callbacks instead of class messages

If you compare with existing client/server SOA solutions (in Delphi, Java, C# or even in Go or other frameworks), mORMot's interface-based callback mechanism sounds pretty unique and easy to work with.

Most Events Oriented solutions do use a set of dedicated messages to propagate the events, with a centralized Message Bus (like MSMQ or JMS), or a P2P/decentralized approach (see e.g. ZeroMQ or NanoMsg). In practice, you are expected to define one class per message, the class fields being the message values. You would define e.g. one class to notify a successful process, and another class to notify an error. SOA services would eventually tend to be defined by a huge number of individual classes, with the temptation of re-using existing classes in several contexts.

Our interface-based approach allows to gather all events:

  • In a single interface type per notification, i.e. probably per service operation;
  • With one method per event;
  • Using method parameters defining the event values.

Since asynchronous notifications are needed most of the time, method parameters would be one-way, i.e. defined only as const - in such case, an evolved algorithm would transparently gather those outgoing messages, to enhance scalability when processing such asynchronous events. Blocking request may also be defined as var/out, as we will see below, inWorkflow adaptation.

Behind the scene, the framework would still transmit raw messages over IP sockets (currently over a WebSockets connection), like other systems, but events notification would benefit from using interfaces, on both server and client sides.
We will now see how...

Continue reading...

2015, Wednesday September 16

Feedback from the Wild

We just noticed a nice feedback from a mORMot user.

Vojko Cendak commented the well-known DataSnap analysis based on Speed & Stability tests blog article written by Roberto some months years (!) ago.
It is not meant to be the final word, perhaps there was some tuning possible for RTC (which is told to be very reliable), but it is worth a look:

We used 3 products: RO, RTC and Mormot.. I won’t speak about RO ( slow and heavy ). We tried RTC but was too very slow and CPU consuming in getting lots of 1000 .. 5000 dynamically fetching OPC tags (let’s say list of small objects) – at least once per second (one client). I mean Mormot is FAST and we’re glad to be so. We use Mormot in actual productions 24/7 on several sites: servers don’t even blink on client requests and run smoothly and reliably.

Thanks for the kind words!
We have a lot of feedback, around the world, from users of our little Open Source project, very happy with its abilities.
We try to make it always better! Open Source, and Delphi as a platform, do rock!

2015, Monday September 14

Performance issue in NextGen ARC model - much better now

Back in 2013, I found out an implementation weakness in the implementation of ARC weak references in the RTL.
A giant lock was freezing all threads and cores, so would decrease a lot the performance abilities of any ARC application, especially in multi thread.

I just investigated that things are now better.

Continue reading...

2015, Saturday August 15

Breaking Change in mORMot WebSockets binary protocol

Among all its means of transmission, our mORMot framework features WebSockets, allowing bidirectional communications, and interface-based callbacks for real time notification of SOA events.
After several months of use in production, we identified some needed changes for this just emerged feature.

We committed today a breaking change of the data layout used for our proprietary WebSockets binary protocol.
From our tests, it would increase the performance and decrease the resource consumption, especially in case of high number of messages.

Continue reading...

2015, Tuesday June 30

Faster String process using SSE 4.2 Text Processing Instructions STTNI

A lot of our code, and probably yours, is highly relying on text process.
In our mORMot framework, most of its features use JSON text, encoded as UTF-8.
Profiling shows that a lot of time is spent computing the end of a text buffer, or comparing text content.

You may know that In its SSE4.2 feature set, Intel added STTNI (String and Text New Instructions) opcodes.
They are several new instructions that perform character searches and comparison on two operands of 16 bytes at a time.

I've just committed optimized version of StrComp() and StrLen(), also used for our TDynArrayHashed wrapper.
The patch works from Delphi 5 up to XE8, and with FPC - unknown SSE4.2 opcodes have been entered as hexadecimal bytes, for compatibility with the last century compilers!
The resulting speed up may be worth it!

Next logical step would be to use those instruction in the JSON process itself.
It may speed up the parsing speed of our core functions (which is already very optimized, but written in a classical one-char-at-a-time reading).
Main benefit would be to read the incoming UTF-8 text buffer by blocks of 16 bytes, and performing several characters comparison in a few CPU cycles, with no branching.
Also JSON writing would benefit for it, since escaping could be speed up thanks to STTNI instructions.

Any feedback is welcome, as usual!

2015, Sunday June 21

Why FPC may be a better compiler than Delphi

Almost every time I'm debugging some core part of our framework, I like to see the generated asm, and trying to optimize the pascal code for better speed - when it is worth it, of course!
I just made a nice observation, when comparing the assembler generated by Delphi to FPC's output.

Imagine you compile the following lines (extracted from SynCommons.pas), to convert some number into ASCII characters:

c100 := val div 100;
PWord(P)^ := TwoDigitLookupW[val];

This divides a number by 100, then computes the modulo in val, to store two digits at a time. We did not use val := val mod 100 here, since mod would do another division, so we rely on a simple multiplication to compute the modulo.
You may know that for today's CPUs, integer multiplication is very optimized, taking a cycle (or less, thanks to its pipelines), whereas a division is much more expensive - if you have some spare time, take a look at this document, and you will find out that a div opcode could use 10 times more cycles then a mul - even with the latest CPU architectures.
Let's see how our two beloved compilers do their homework (with optimization enabled, of course)...

Delphi generates the following code for c100 := val div 100:

005082AB 8BC1             mov eax,ecx
005082AD BE64000000       mov esi,$00000064
005082B2 33D2             xor edx,edx
005082B4 F7F6             div esi

Whereas FPC generates the following:

0043AC48 8b55f8                   mov    -0x8(%ebp),%edx
0043AC4B b81f85eb51               mov    $0x51eb851f,%eax
0043AC50 f7e2                     mul    %edx
0043AC52 c1ea05                   shr    $0x5,%edx
0043AC55 8955f0                   mov    %edx,-0x10(%ebp)

Even if you are assembler agnostic, and once you did get rid of the asm textual representation (Delphi uses Intel's, whereas FPC/GDB follows AT&T), you can see that Delphi generates a classic (and slow) div esi opcode, whereas FPC uses a single multiplication, followed by a bit shift.

This optimization is known as "Reciprocal Multiplication", and I would let you read this article for mathematical reference - or this one.
It multiplies (mul) the number by the power of two reciprocal of 100 (which is the hexadecimal 51eb851f value), followed by a right shift (shr) of 5 bits.
Thanks to 32 bit rounding of the integer operations, this would in fact divide the number per 100.
Even it consists in two assembler opcodes, a mul + shr is in fact faster than a single div.

It is a shame that the Delphi compiler did not include this very common optimization, which is clearly a win for some very common tasks. 

Of course, the LLVM back-end used on the NextGen compiler can do it, be we may expect this classic optimization be part of the decades-old Delphi compiler.
And I'm still not convinced about the performance of the NextGen generated code, since the associated RTL is known to be slow, so won't benefit of LVVM optimization - which takes  a LOT of time to compile, by the way (much more than FPC).

Congrats, FPC folks!

2015, Monday May 18

CQRS Persistence Service of any DDD object with mORMot

We introduced DDD concepts some time ago, in a series of articles in this blog.
At that time, we proposed a simple way of using mORMot types to implement DDD in your applications.
But all Domain Entitities being tied to the framework TSQLRecord class did appear as a limitation, breaking the Persistence Ignorance principle, since it couples the DDD objects to the framework implementation details.

We introduced a new mORMotDDD.pas unit, which is able to easily create CQRS Persistence services for any plain Delphi class (the famous PODOs - Plain Old Delphi Objects).
No need to inherit from TSQLRecord, or pollute your class definition with attributes!

For instance, a TUser class may be persisted via such a service:

  IDomUserCommand = interface(IDomUserQuery)
    function Add(const aAggregate: TUser): TCQRSResult;
    function Update(const aUpdatedAggregate: TUser): TCQRSResult;
    function Delete: TCQRSResult;
    function Commit: TCQRSResult;

Here, the write operations are defined in a IDomUserCommand service, which is separated (but inherits) from IDomUserQuery, which is used for read operations.
Separating reads and writes is a powerful pattern also known as CQRS, i.e. Command Query Responsibility Segregation, which we followed when defining our persistence services.
The framework make it pretty easy to create such services for storing any kind of class type in any SQL or NoSQL engine, with almost no code to write.
Last but not least, using such interface-based services for data persistence will allow to stub or mock the data access layer, making unit testing straightforward: you would not fear to write TDD code any more!

Please refer to our updated documentation for this unique and powerful feature.
You may take a look at the corresponding dddDomUserTypes.pas, dddDomUserCQRS.pas, and dddInfraRepoUser.pas units, detailed as sample reference.
Feedback is welcome in our forum, as usual!

2015, Friday May 8

I do not like people shoot in my foot, do you?

There was some discussion about the new TStringHelper feature introduced in latest versions of Delphi.
I was told to be some kind of archaic guy, not able to see the benefit of this.
Reducing opinions to a conservative/progressive approach - another famous 10 kinds of coders - is very reductive.

Of course, this was IMHO unfair and my point was that I have the feeling that some decisions about the Delphi language and RTL are inadequate.
Some changes are welcome. I enjoy the introduction of generics - even if it is was painful, and even buggy (do not use TList<T> with managed record types in XE8!).
But some upcoming changes about the string policy - breaking everything just because we want to align with mainstream C# or Java habits - are just non sense to me.
I really think that Embarcadero deciders like to shoot their own foot.
Or - certainly worse - our own feet!

I will post here some part of the discussion...
So that we may be able to share our ideas.

Continue reading...

2015, Saturday February 21

SynCrypto: SSE4 x64 optimized asm for SHA-256

We have just included some optimized x64 assembler to our Open Source SynCrypto.pas unit so that SHA-256 hashing will perform at best speed.
It is an adaptation from tuned Intel's assembly macros, which makes use of the SSE4 instruction set, if available.

Continue reading...

2015, Monday February 16

Benchmarking JsonDataObjects JSON parser

There is a new player in town.
Since it has been written by Andreas Hausladen, the maintainer of the great Delphi IDE fix packs, this new JSON library is very promising.

And in fact, it is fast, and sounds pretty great!
Here are some numbers, compared with SuperObject, standard DBXJson, dwsJSON, QDAC and mORMot.
Please refer to previous benchmark articles about those libraries. We will now focus on JsonDataObjects.

Continue reading...

2015, Sunday February 1

Benchmarking QDAC3 JSON parser

Do you know QDAC3 ?
This is an open source project, from China (with Chinese comments and exception errors messages, but the methods and variables are in English).
It is cross-platform, and told to be very fast about JSON process.

You can download this Open Source project code from
And their blog - in Chinese - is at

So I included QDAC3 in our "25 - JSON performance" sample.
Numbers are talking, now.

Continue reading...

2015, Thursday January 15

AES-NI enabled for SynCrypto

Today, we committed a new patch to enable AES-NI hardware acceleration to our SynCrypto.pas unit.

Intel® AES-NI is a new encryption instruction set that improves on the Advanced Encryption Standard (AES) algorithm and accelerates the encryption of data on newer processors.

Of course, all this is available in the Delphi unit, from Delphi 6 to XE7: no external dll nor OS update is needed.
And it will work also on Linux, so could help encrypting the mORMot transmission with no power loss.

You have nothing to do: just upgrade your mORMot source code, then AES-NI instructions will be used, if the CPU offers it.
We have seen performance boost of more than 5x, depending on the size of the data to be encrypted.


2015, Saturday January 10

mORMot under Linux thanks to FPC

You can use the FreePascal Compiler (FPC) to compile the mORMot framework source code, targetting Windows and Linux.

Linux is a premium target for cheap and efficient server Hosting. Since mORMot has no dependency, installing a new mORMot server is as easy as copying its executable on a blank Linux host, then run it. No need to install any framework nor runtime. You could even use diverse operating systems (several Linux or Windows Server versions) in your mORMot servers farm, with minimal system requirements, and updates.

We will now see how to write your software with Linux-compiling in mind, and also give some notes about how to install a Linux Virtual Machine with Lazarus on your Windows computer, compiling both FPC and Lazarus from their SVN latest sources!

Continue reading...

2014, Wednesday December 31

2015: the future of mORMot is BigData

How would be 2015 like for our little rodents?
Due to popular request of several users of mORMot, we identified and designed some feature requests dedicated to BigData process.

In fact, your data is the new value, especially if you propose SaaS (Software As A Service) hosting to your customers, with a farm of mORMot servers.
Recent Linux support for mORMot servers, together with the high performance and installation ease of our executable, open the gate to cheap cloud-based hosting.
As a consequence, a lot of information would certainly be gathered by your mORMot servers, and a single monolithic database is not an option any more.

For mORMot solutions hosted in cloud, a lot of data may be generated. The default SQLite3 storage engine may be less convenient, once it reaches some GB of file content. Backup becomes to be slow and inefficient, and hosting this oldest data in the main DB, probably stored on an expensive SSD, may be a lost of resource. Vertical scaling is limited by hardware and price factors.

This is were data sharding comes into scene.
Note that sharding is not replication/backup, nor clustering, nor just spreading. We are speaking about application-level data splitting, to ease maintenance and horizontal scalability of mORMot servers.

Data sharding could already be implemented with mORMot servers, thanks to TSQLRestStorage:

  • Using TSQLRestStorageExternal: any table may have its own external SQL database engine, may be in its separated DB server;
  • Using TSQLRestStorageMongoDB: any table may be stored on a MongoDB cluster, with its own sharding abilities;
  • Using TSQLRestStorageRemote: each table may have its own remote ORM/REST server.

But when data stored in a single table tends to grow without limit, this feature is not enough.
Let's see how the close future of mORMot looks like.

Continue reading...

2014, Tuesday November 18

HTTP remote access for SynDB SQL execution

For mORMot, we developed a fully feature direct access layer to any RDBMS, implemented in the SynDB.pas unit.

You can use those SynDB classes to execute any SQL statement, without any link to the framework ORM.
At reading, the resulting performance is much higher than using the standard TDataSet component, which is in fact a true performance bottleneck.
It has genuine features, like column access via late-binding, an innovative ISQLDBRows interface, and ability to directly access the low-level binary buffers of the database clients.

We just added a nice feature to those classes: the ability to access remotely, via plain HTTP, to any SynDB supported database!

Continue reading...

2014, Friday October 24

MVC/MVVM Web Applications

We almost finished implementing a long-standing feature request, introducing MVC / MVVM for Web Applications (like RubyOnRails) in mORMot.
This is a huge step forward, opening new perspectives not only to our framework, but for the Delphi community.
In respect to the existing MVC frameworks for Delphi, our  solution is closer to Delphi On Rails (by the convention-over-configuration pattern) than the Delphi MVC Framework or XMM.
The mORMot point of view is unique, and quite powerful, since it is integrated with other parts of our framework, as its ORM/ODM or interface-based services.
Of course, this is a work in progress, so you are welcome to put your feedback, patches or new features!

We will now explain how to build a MVC/MVVM web application using mORMot, starting from the "30 - MVC Server" sample.
First of all, check the source in our GitHub repository: two .pas files, and a set of minimalist Mustache views.

This little web application publishes a simple BLOG, not fully finished yet (this is a Sample, remember!).
But you can still execute it in your desktop browser, or any mobile device (thanks to a simple Bootstrap-based responsive design), and see the articles list, view one article and its comments, view the author information, log in and out.

This sample is implemented as such:

MVVM Source mORMot
Model MVCModel.pas TSQLRestServerDB ORM over a SQlite3 database
View *.html Mustache template engine in the Views sub-folder
ViewModel MVCViewModel.pas Defined as one IBlogApplication interface

For the sake of the simplicity, the sample will create some fake data in its own local SQlite3 database, the first time it is executed.

Continue reading...

2014, Friday September 12

Faster WideString process for good old non Unicode Delphi 6-2007

For pre-Unicode versions of Delphi, the unique way of having UTF-16 native type is to use the WideString type.
This type, under Windows, matched the BSTR managed type, as used by OLE and COM components.

In Delphi, WideString implementation calls directly the corresponding Windows API, and do not use the main Delphi heap manager.
Even if since Vista, this API did have a huge speed-up, it is still in practice much slower than the regular string type. Problems is not about UTF-16 encoding, but about the memory allocation, which is shared among processes, using the Windows global heap, and is much slower than our beloved FastMM4.
Newer versions of Delphi (since Delphi 2009) feature a refactored string = UnicodeString type, which relies on FastMM4 and not the Windows API, and is much faster than WideString.

Within our mORMot framework, we by-passed this limitation by using our RawUTF8 type, which is UTF-8 encoded, so as Unicode ready as the new UnicodeString type, and pretty fast.
In a recent internal project, we had to use a lot of WideString instances, to support UTF-16 encoding in Delphi 7/2007, involving a lot of text.

It sounded to be very slow, so we had to do something!

This is where our new SynFastWideString unit comes in.

Purpose of this unit is to patch the system.pas unit for older versions of Delphi, so that WideString memory allocation would use FastMM4 instead of the slow BSTR Windows API.
It will speed up the WideString process a lot, especially when a lot of content is allocated, since FastMM4 is much more aggressive than Windows' global heap and the BSTR slow API. It could be more than 50 times faster, especially when releasing the used memory.
The WideString implementation pattern does NOT feature Copy-On-Write, so is still slower than the string UnicodeString type as implemented since Delphi 2009. This is the reason why this unit won't do anything on Unicode versions of the compiler, since the new string type is to be preferred there.

Continue reading...

2014, Saturday August 16

Will WebSocket replace HTTP? Does it scale?

You certainly noticed that WebSocket is the current trendy flavor for any modern web framework.
But does it scale? Would it replace HTTP/REST?
There is a feature request ticket about them for mORMot, so here are some thoughts - matter of debate, of course!
I started all this by answering a StackOverflow question, in which the actual answers were not accurate enough, to my opinion.

From my point of view, Websocket - as a protocol - is some kind of monster.

You start a HTTP stateless connection, then switch to WebSocket mode which releases the TCP/IP dual-direction layer, then you may switch later on back to HTTP...
It reminds me some kind of monstrosity, just like encapsulating everything over HTTP, using XML messages... Just to bypass the security barriers... Just breaking the OSI layered model...
It reminds me the fact that our mobile phone data providers do not use broadcasting for streaming audio and video, but regular Internet HTTP servers, so the mobile phone data bandwidth is just wasted when a sport event occurs: every single smart phone has its own connection to the server, and the same video is transmitted in parallel, saturating the single communication channel... Smart phones are not so smart, aren't they?

WebSocket sounds like a clever way to circumvent a limitation...
But why not use a dedicated layer?
I hope HTTP 2.0 would allow pushing information from the server, as part of the standard... and in one decade, we probably will see WebSocket as a deprecated technology.
You have been warned. Do not invest too much in WebSockets..

OK. Back to our existential questions...
First of all, does the WebSocket protocol scale?
Today, any modern single server is able to server millions of clients at once.
Its HTTP server software has just to be is Event-Driven (IOCP) oriented (we are not in the old Apache's one connection = one thread/process equation any more).
Even the HTTP server built in Windows (http.sys - which is used in mORMot) is IOCP oriented and very efficient (running in kernel mode).
From this point of view, there won't be a lot of difference at scaling between WebSocket and a regular HTTP connection. One TCP/IP connection uses a little resource (much less than a thread), and modern OS are optimized for handling a lot of concurrent connections: WebSocket and HTTP are just OSI 7 application layer protocols, inheriting from this TCP/IP specifications.

But, from experiment, I've seen two main problems with WebSocket:

  1. It does not support CDN;
  2. It has potential security issues.

Continue reading...

2014, Monday August 11

Cross-Platform mORMot Clients - Smart Mobile Studio

Current version of the main framework units target only Win32 and Win64 systems.

It allows to make easy self-hosting of mORMot servers for local business applications in any corporation, or pay cheap hosting in the Cloud, since mORMot CPU and RAM expectations are much lower than a regular IIS-WCF-MSSQL-.Net stack.
But in a Service-Oriented Architecture (SOA), you would probably need to create clients for platforms outside the Windows world, especially mobile devices.

A set of cross-platform client units is therefore available in the CrossPlatform sub-folder of the source code repository. It allows writing any client in modern object pascal language, for:

  • Any version of Delphi, on any platform (Mac OSX, or any mobile supported devices);
  • FreePascal Compiler 2.7.1;
  • Smart Mobile Studio 2.1, to create AJAX or mobile applications (via PhoneGap, if needed).

This series of articles will introduce you to mORMot's Cross-Platform abilities:

Any feedback is welcome in our forum, as usual!

Continue reading...

- page 1 of 6