Tag - multithread

Entries feed - Comments feed

2020-05-07

New Multi-thread Friendly Memory Manager for FPC written in x86_64 assembly

As a gift to the FPC community, I just committed a new Memory Manager for FPC.
Check mormot.core.fpcx64mm.pas in our mORMot2 repository.
This is a stand-alone unit for FPC only.

It targets Windows and Linux multi-threaded Service applications - typically mORMot daemons.
It is written in almost pure x86_64 assembly, and some unique tricks in the Delphi/FPC Memory Manager world.

It is based on FastMM4 (not FastMM5), and we didn't follow the path of the FastMM4-AVX version - instead of AVX, we use plain good (non-temporal) SSE2 opcode, and we rely on the mremap API on Linux for very efficient reallocation. Using mremap is perhaps the biggest  benefit of this memory manager - it leverages a killer feature of the Linux kernel for sure. By the way, we directly call the Kernel without the need of the libc.

We tuned our x86_64 assembly a lot, and made it cross-platform (Windows and POSIX). We profiled the multi-threading, especially by adding some additional small blocks for GetMem (which is a less expensive notion of "arenas" as used in FastMM5 and most C allocators), introducing an innovatice and very efficient round-robin of tiny blocks (<128 bytes), and proper spinning for FreeMem and medium blocks.

It runs all our regression tests with huge performance and stability - including multi-threaded tests with almost no slow down: sleep is reported as less than 1 ms during a 1 minute test. It has also been validated on some demanding multi-threaded tasks.

Continue reading

2020-02-17

New move/fillchar optimized sse2/avx asm version

Our Open Source framework includes some optimized asm alternatives to RTL's move() and fillchar(), named MoveFast() and FillCharFast().

We just rewrote from scratch the x86_64 version of those, which was previously taken from third-party snippets.
The brand new code is meant to be more efficient and maintainable. In particular, we switched to SIMD 128-bit SSE2 or 256bit AVX memory access (if available), whereas current version was using 64-bit regular registers. The small blocks (i.e. < 32 bytes) process occurs very often, e.g. when processing strings, so has been tuned a lot. Non temporal instructions (i.e. bypassing the CPU cache) are used for biggest chunks of data. We tested ERMS support, but it was found of no benefit in respect to our optimized SIMD, and was actually slower than our non-temporal variants. So ERMS code is currently disabled in the source, and may be enabled on demand by a conditional.

FPC move() was not bad. Delphi's Win64 was far from optimized - even ERMS was poorly introduced in latest RTL, since it should be triggered only for blocks > 2KB. Sadly, Delphi doesn't support AVX assembly yet, so those opcodes would be available only on FPC.

Resulting numbers are talking by themselves. Working on Win64 and Linux, of course.

Continue reading

2016-01-09

Safe locks for multi-thread applications

Once your application is multi-threaded, concurrent data access should be protected. We already wrote about how debugging multi-thread applications may be hard.
Otherwise, a "race condition" issue may appear: for instance, if two threads modify a variable at the same time (e.g. decrease a counter), values may become incoherent and unsafe to use. Another symptom of broken logic is the "deadlock", by which the whole application appears to be blocked and unresponsive, when two threads have a wrong use of the lock, so are blocking each-others.
On a server system, which is expected to run 24/7 with no maintenance, such issues are to be avoided.

In Delphi, protection of a resource (which may be an object, or any variable) is usually done via Critical Sections.
A critical section is an object used to make sure, that some part of the code is executed only by one thread at a time. A critical section needs to be created/initialized before it can be used and be released when it is not needed anymore. Then, some code is protected using Enter/Leave methods, which would lock its execution: in practice, only a single thread would own the critical section, so only a single thread would be able to execute this code section, and other threads would wait until the lock is released. For best performance, the protected sections should be as small as possible - otherwise the benefit of using threads may be voided, since any other thread would wait for the thread owning the critical section to release the lock.

We will now see that Delphi's TCriticalSection may have potential issues, and what our framework proposes to ease critical section use in your applications.

Continue reading

2015-09-14

Performance issue in NextGen ARC model - much better now

Back in 2013, I found out an implementation weakness in the implementation of ARC weak references in the RTL.
A giant lock was freezing all threads and cores, so would decrease a lot the performance abilities of any ARC application, especially in multi thread.

I just investigated that things are now better.

Continue reading

2015-05-14

Using TSynLog with a lot of threads? PYou should better upgrade your source

We identified and fixed today several issues which may affect applications creating a lot of threads (i.e. not using a thread pool). The symptom was an unexpected access violation, when you reach a multiple of 256 threads count. You should better upgrade to at least revision 1.18.1351 if your  […]

Continue reading

2014-10-24

MVC/MVVM Web Applications

We almost finished implementing a long-standing feature request, introducing MVC / MVVM for Web Applications (like RubyOnRails) in mORMot.
This is a huge step forward, opening new perspectives not only to our framework, but for the Delphi community.
In respect to the existing MVC frameworks for Delphi, our  solution is closer to Delphi On Rails (by the convention-over-configuration pattern) than the Delphi MVC Framework or XMM.
The mORMot point of view is unique, and quite powerful, since it is integrated with other parts of our framework, as its ORM/ODM or interface-based services.
Of course, this is a work in progress, so you are welcome to put your feedback, patches or new features!

We will now explain how to build a MVC/MVVM web application using mORMot, starting from the "30 - MVC Server" sample.
First of all, check the source in our GitHub repository: two .pas files, and a set of minimalist Mustache views.

This little web application publishes a simple BLOG, not fully finished yet (this is a Sample, remember!).
But you can still execute it in your desktop browser, or any mobile device (thanks to a simple Bootstrap-based responsive design), and see the articles list, view one article and its comments, view the author information, log in and out.

This sample is implemented as such:

MVVM Source mORMot
Model MVCModel.pas TSQLRestServerDB ORM over a SQlite3 database
View *.html Mustache template engine in the Views sub-folder
ViewModel MVCViewModel.pas Defined as one IBlogApplication interface

For the sake of the simplicity, the sample will create some fake data in its own local SQlite3 database, the first time it is executed.

Continue reading

2014-07-12

Static class variables are just global variables in disguise

Cape Cod Gunny just wrote a blog article about how to replace a global variable by a static class instance.

But I had to react!
Using such static declaration is just another way of creating a global variable.
This is just a global variable in disguise.
In fact, the generated asm will be just like a global variable!

It encapsulates the global declaration within a class name space, but it is still IMHO a very wrong design.
I've seen so many C# or Java code which used such a pattern (there is no global variable in those languages), and it has the same disadvantages as global variables.
Just like the singleton syndrome
Code is just not re-entrant nor thread-safe.
Nightmare to debug and let evolve.

Continue reading

2014-06-09

Performance comparison from Delphi 6, 7, 2007, XE4 and XE6

Since there was recently some articles about performance comparison between several versions of the Delphi compiler, we had to react, and gives our personal point of view.

IMHO there won't be any definitive statement about this.
I'm always doubtful about any conclusion which may be achieved with such kind of benchmarks.
Asking "which compiler is better?" is IMHO a wrong question.
As if there was some "compiler magic": the new compiler will be just like a new laundry detergent - it will be cleaner and whiter...

Performance is not about marketing.
Performance is an iterative process, always a matter of circumstances, and implementation.

Circumstances of the benchmark itself.
Each benchmark will report only information about the process it measured.
What you compare is a limited set of features, running most of the time an idealized and simplified pattern, which shares nothing with real-world process.

Implementation is what gives performance.
Changing a compiler will only gives you some percents of time change.
Identifying the true bottlenecks of an application via a profiler, then changing the implementation of the identified bottlenecks may give order of magnitudes of speed improvement.
For instance, multi-threading abilities can be achieved by following some simple rules.

With our huge set of regression tests, we have at hand more than 16,500,000 individual checks, covering low-level features (like numerical and text marshaling), or high-level process (like concurrent client/server and database multi-threaded process).

You will find here some benchmarks run with Delphi 6, 7, 2007, XE4 and XE6 under Win32, and XE4 and XE6 under Win64.
In short, all compilers performs more or less at the same speed.
Win64 is a little slower than Win32, and the fastest appears to be Delphi 7, using our enhanced and optimized RTL.

Continue reading

2014-04-28

Mustache Logic-less templates for Delphi - part 3

Mustache is a well-known logic-less template engine.
There is plenty of Open Source implementations around (including in JavaScript, which can be very convenient for AJAX applications on client side, for instance).
For mORMot, we created the first pure Delphi implementation of it, with a perfect integration with other bricks of the framework.

In last part of this series of blog articles, we will introduce the Mustache library included within mORMot source code tree.
You can download this documentation as one single pdf file.

Continue reading

Mustache Logic-less templates for Delphi - part 2

Mustache is a well-known logic-less template engine.
There is plenty of Open Source implementations around (including in JavaScript, which can be very convenient for AJAX applications on client side, for instance).
For mORMot, we created the first pure Delphi implementation of it, with a perfect integration with other bricks of the framework.

In this second part of this series of blog articles, we will introduce the Mustache syntax.
You can download this documentation as one single pdf file.

Continue reading

Mustache Logic-less templates for Delphi - part 1

Mustache is a well-known logic-less template engine.
There is plenty of Open Source implementations around (including in JavaScript, which can be very convenient for AJAX applications on client side, for instance).
For mORMot, we created the first pure Delphi implementation of it, with a perfect integration with other bricks of the framework.

In this first part of this series of blog articles, we will introduce the Mustache design.
You can download this documentation as one single pdf file.

Continue reading

2014-04-07

JavaScript support in mORMot via SpiderMonkey

As we already stated, we finished the first step of integration of the SpiderMonkey engine to our mORMot framework.
Version 1.8.5 of the library is already integrated, and latest official revision will be soon merged, thanks to mpv's great contribution.
It can be seen as stable, since it is already used on production site to serve more than 1,000,000 requests per day.

You can now easily uses JavaScript on both client and server side.
On server side, mORMot's implementation offers an unique concept, i.e. true multi-threading, which is IMHO a huge enhancement when compared to the regular node.js mono-threaded implementation, and its callback hell.
In fact, node.js official marketing states its non-blocking scheme is a plus. It allows to define a HTTP server in a few lines, but huge server applications need JavaScript experts not to sink into a state a disgrace.

Continue reading

2014-03-07

Support of MySQL, DB2 and PostgreSQL

We just tested, benchmarked and validated Oracle MySQL, IBM DB2 and PostgreSQL support for our SynDB database classes and the mORMot's ORM core.
This article will also show all updated results, including our newly introduced multi-value INSERT statement generations, which speed up a lot BATCH insertion.

Stay tuned!

Purpose here is not to say that one library or database is better or faster than another, but publish a snapshot of mORMot persistence layer abilities, depending on each access library.

In this timing, we do not benchmark only the "pure" SQL/DB layer access (SynDB units), but the whole Client-Server ORM of our framework.

Process below includes all aspects of our ORM:

  • Access via high level CRUD methods (Add/Update/Delete/Retrieve, either per-object or in BATCH mode);
  • Read and write access of TSQLRecord instances, via optimized RTTI;
  • JSON marshaling of all values (ready to be transmitted over a network);
  • REST routing, with security, logging and statistic;
  • Virtual cross-database layer using its SQLite3 kernel;
  • SQL on-the-fly generation and translation (in virtual mode);
  • Access to the database engines via several libraries or providers.

In those tests, we just bypassed the communication layer, since TSQLRestClient and TSQLRestServer are run in-process, in the same thread - as a TSQLRestServerDB instance. So you have here some raw performance testimony of our framework's ORM and RESTful core, and may expect good scaling abilities when running on high-end hardware, over a network.

On a recent notebook computer (Core i7 and SSD drive), depending on the back-end database interfaced, mORMot excels in speed, as will show the following benchmark:

  • You can persist up to 570,000 objects per second, or retrieve 870,000 objects per second (for our pure Delphi in-memory engine);
  • When data is retrieved from server or client 38, you can read more than 900,000 objects per second, whatever the database back-end is;
  • With a high-performance database like Oracle, and our direct access classes, you can write 70,000 (via array binding) and read 160,000 objects per second, over a 100 MB network;
  • When using alternate database access libraries (e.g. Zeos, or DB.pas based classes), speed is lower (even if comparable for DB2, MS SQL, PostgreSQL, MySQL) but still enough for most work, due to some optimizations in the mORMot code (e.g. caching of prepared statements, SQL multi-values insertion, direct export to/from JSON, SQlite3 virtual mode design, avoid most temporary memory allocation...).

Difficult to find a faster ORM, I suspect.

Continue reading

2013-12-05

New Open Source Multi-Thread ready Memory Manager: SAPMM

Do you remember this former article about scalability of the Delphi memory manager, in multi-thread execution context?

Our SynScaleMM is still experimental.
But did pretty well, for an experiment!

At first, you can take a look at ScaleMM2, which is more stable, and based on the same ground.

But a new multi-thread friendly memory manager for Delphi just came out.
It is in fact the anonymous (and already famous) "NN memory manager" Primož talked about in his article about string building and memory managers.


(Note that in this article, our SynScaleMM was found to be scaling very well, but on the other hand, Primož did compile its benchmark program in Debug mode, so our TTextWriter was not in good shape: when you compile in Release mode, optimizations and inlining are ON, and our good TTextWriter just flies... See the note at the beginning of the article - this is why I never find those benchmarks very informative. I always prefer profiling from the real world with real useful process… and was never convinced by any such naive benchmark.)

OK, back to our business!

SapMM is an interesting beast.
https://code.google.com/p/sapmm

Sounds like if Alexei (the initial coder) has a C coding background. But that's fine when you have to deal with low-level structures and algorithms, as required by a memory manager. :)
It features everything we may ask for such a piece of code: clear design, optimized code (mostly by inlining process), memory leak reporting, some parameters for tuning.

It is only for Delphi XE (and up) under Win32 by now, but contributors are welcome!
It is used in production since more than half a year, and it passed all FastcodeMM benchmark tests.

If you want a direct link of the today's source code, without SVN, you may try this direct link from our site.
(but it probably will never be updated - you are warned)

Continue reading

2013-11-04

Updated mORMot database benchmark - including MS SQL and PostgreSQL

On an recent notebook computer (Core i7 and SSD drive), depending on the back-end database interfaced, mORMot excels in speed:

  • You can persist up to 570,000 objects per second, or retrieve more than 900,000 objects per second (for our pure Delphi in-memory engine);
  • When data is retrieved from server or client cache, you can read more than 900,000 objects per second, whatever the database back-end is;
  • With a high-performance database like Oracle and our direct access classes, you can write 65,000 (via array binding) and read 160,000 objects per second, over a 100 MB network;
  • When using alternate database access libraries (e.g. Zeos, or DB.pas based classes), speed is lower, but still enough for most work.

Difficult to find a faster ORM, I suspect.

The following tables try to sum up all available possibilities, and give some benchmark (average objects/second for writing or read).

In these tables:

  • 'SQLite3 (file full/off/exc)' indicates use of the internal SQLite3 engine, with or without Synchronous := smOff and/or DB.LockingMode := lmExclusive;
  • 'SQLite3 (mem)' stands for the internal SQLite3 engine running in memory;
  • 'SQLite3 (ext ...)' is about access to a SQLite3 engine as external database - either as file or memory;
  • 'TObjectList' indicates a TSQLRestServerStaticInMemory instance, either static (with no SQL support) or virtual (i.e. SQL featured via SQLite3 virtual table mechanism) which may persist the data on disk as JSON or compressed binary;
  • 'Oracle' shows the results of our direct OCI access layer (SynDBOracle.pas);
  • 'NexusDB' is the free embedded edition, available from official site;
  • 'Zeos *' indicates that the database was accessed directly via the ZDBC layer;
  • 'FireDAC *' stands for FireDAC library;
  • 'UniDAC *' stands for UniDAC library;
  • 'BDE *' when using a BDE connection;
  • 'ODBC *' for a direct access to ODBC;
  • 'Jet' stands for a MSAccess database engine, accessed via OleDB;
  • 'MSSQL local' for a local connection to a MS SQL Express 2008 R2 running instance (this was the version installed with Visual Studio 2010), accessed via OleDB.

This list of database providers is to be extended in the future. Any feedback is welcome!

Numbers are expressed in rows/second (or objects/second). This benchmark was compiled with Delphi 7, so newer compilers may give even better results, with in-lining and advanced optimizations.

Note that these tests are not about the relative speed of each database engine, but reflect the current status of the integration of several DB libraries within the mORMot database access.

Purpose here is not to say that one library or database is better or faster than another, but publish a snapshot of current mORMot persistence layer abilities.

In this timing, we do not benchmark only the "pure" SQL/DB layer access (SynDB units), but the whole Client-Server ORM of our framework: process below includes read and write RTTI access of a TSQLRecord, JSON marshaling, CRUD/REST routing, virtual cross-database layer, SQL on-the-fly translation. We just bypass the communication layer, since TSQLRestClient and TSQLRestServer are run in-process, in the same thread - as a TSQLRestServerDB instance. So you have here some raw performance testimony of our framework's ORM and RESTful core.

You can compile the "15 - External DB performance" supplied sample code, and run the very same benchmark on your own configuration.

Continue reading

2013-10-09

Good old object is not to be deprecated - it is the future

Yes, I know this article title is a huge moment of trolling for most Delphi developer.
But object could be legend... - wait for it - ... dary!

You perhaps already noticed by several blog posts here that I still like the good old (and deprecated) object type, in addition to the common heap-allocated class type.
Plain record with methods does not match the object-oriented approach of object, since it does not feature inheritance.

When you take a look at modern strongly-typed languages, targeting concurrent programming (you know, multi-thread/multi-core execution), you will see that the objects may be allocated in several ways, to facilitate execution flow.

The Rust language for instance is pretty interesting. It has optional task-local Garbage Collection and safe pointer types with region analysis.

To some extent, it is very similar to what object allows in the Delphi world, and why I'm still using/loving it!

Continue reading

2013-07-24

Tempering Garbage Collection

I'm currently fighting against out of memory errors on an heavy-loaded Java server.

If only it had been implemented in Delphi and mORMot!
But at this time, the mORMot was still in its burrow. :)
Copy-On-Write and a good heap manager can do wonders of stability.

Here are some thoughts about Garbage Collector, and how to temper their limitations.
They may apply to both the JVM and the .Net runtime, by the way.

Continue reading

2012-12-20

How to make it fast?

On our forum, a clever question was posted about publishing some enhanced RTL functions for newer versions of Delphi - as we did for Delphi 7 and 2007.

I was looking for a faster IntToStr implementation and discovered SynCommons.pas.
(....)
That's really too bad, SynCommons.pas really does contain some seriously fast stuff, people would greatly benefit from it if it was made general-purpose.

In fact, it would not be enough to change the RTL function implementations.
IMHO, to write something scalable, you need to get rid of such functions.

Continue reading

2012-10-06

Delphi XE3 is preparing (weak) reference counting for class instances

In Delphi, you have several ways of handling data life time, therefore several ways of handling memory:

  • For simple value objects (e.g.  byte integer double shortstring and fixed size arrays or record containing only such types), the value is copied in fixed-size buffers;
  • For more complex value objets (e.g. string and dynamic arrays or record containing such types), there is a reference counter handled by each instance, with copy-on-write feature and compiler-generated reference counting at code scope level (with hidden try..finally blocks);
  • For most class instances (e.g. deriving from TObject), you have to Create then Free each instance, and manage its life time by hand - with explicit try..finally blocks;
  • For class deriving from TInterfacedObject, you have a RefCount property, with _AddRef _Release methods (this is the reference-counted COM model), and you can use Delphi interface to work with such instances - see this blog article.
With Delphi XE3, we were told that some automatic memory handling at class level are about to be introduced at the compiler and RTL level.
Even if this feature is not finished, and disabled, there are a lot of changes in the Delphi XE3 Run Time Library which sounds like a preparation of such a new feature.

Continue reading

2012-06-18

Circular reference and zeroing weak pointers

The memory allocation model of the Delphi interface type uses some kind of Automatic Reference Counting (ARC). In order to avoid memory and resource leaks and potential random errors in the applications (aka the terrible EAccessViolation exception on customer side) when using interface, a SOA framework like mORMot has to offer so-called Weak pointers and Zeroing Weak pointers features.

Note that garbage collector based languages (like Java or C#) do not suffer from this problem, since the circular references are handled by their memory model: objects lifetime are maintained globally by the memory manager. Of course, it will increase memory use, slowdown the process due to additional actions during allocation and assignments (all objects and their references have to be maintained in internal lists), and may slow down the application when garbage collector enters in action. In order to avoid such issues when performance matters, experts tend to pre-allocate and re-use objects: this is one common limitation of this memory model, and why Delphi is still a good candidate (like unmanaged C or C++ - and also Objective C) when it deals with performance and stability.

Continue reading

- page 1 of 2