Compiler benchmark from our Open Source ORM framework

When running our unit tests with Delphi 7, Delphi 2007 and Delphi 2010 compilers, I found out some speed improvement between Delphi 7 and Delphi 2007, but nothing noticeable between Delphi 2007 and 2010. Delphi 2010 generated code was found out to be even a bit slower, probably due to the overhead of RTTI and some caching issues. I don't have a Delphi XE compiler at hand, but I guess it's somewhat the same as Delphi 2010 - this latest version was mainly a bug fix release (e.g. about generics), AFAIR.

I spent a lot of time in the asm view (Alt-F2) when I write low-level pascal code, and use a profiler. So I usually notice any difference between Delphi compiler versions.

IMHO the main improvement in the generated code was the inline keyword for methods and functions/procedures, available in Delphi 2007 and not in Delphi 7. Another improvement was a more aggressive register re-use.

Floating-point generated code is still slow and sometimes awfull (the FWAIT is still produced, even if not necessary any more, and inlining floating-point code could be even worse than with no inlining!).

What is interesting about our framework, and all those tests is that it does process a lot of data, using its own low-level units, coded in very tuned pascal for best performance. The unit tests provided (more than 5,400,000 individual tests) work on real data (numerical conversion or UTF-8 text processing), with a lot of diverse processes, including low-level conversions, text parsing, object allocations, multi-threading and Client/Server orientation. So here, the code generation by the compiler does make a difference.

The process is run mainly inside our framework's libraries. We use our own RawUTF8 string type, and not the generic string. Therefore, the bottleneck is not the VCL nor the Windows API, but only pure Delphi compiled code. In fact, we avoid most API calls, even for UTF-8 encoding or numerical conversions.

Of course, I tried this benchmark with PUREPASCAL conditional set, i.e. not running the optimized part in asm, but rely on only "pure pascal" code.

Feedback from the SynLZ compression unit

Another good experiment about speed was writing and profiling our SynLZ compression unit. With this optimized implementation of a LZ-family compression algorithm, compression is up to 20 times faster than zip, decompression 3 times faster. In fact, it competes with LZO for compression ratio and decompression speed, but is much faster than LZO for compression: SynLZ is able to compress the data at the same rate than it decompresses it. Such a symmetrical implementation is very rare in the compression world.

It involves only integer arithmetic and bit logic, filling and lookup in hash tables, and memory copy.

We wrote some very tuned pascal code, then compiled it with Delphi 7 and Delphi 2009.

The Delphi 2009 generated code was faster than Delphi 7, in a noticeable way. Generated code was indeed better, with better register reuse.

With hand-tuned assembler profiling, we achieved even better performance. For instance, a 6 KB XML file is compressed at 14 MB/s using zip, 185 MB/s using LZO, 184 MB/s using the Delphi 2009 pascal version of SynLZ, and 256 MB/s with our final tuned asm version of SynLZ.

Conclusion

For the generation of code involving integer process, text parsing or memory , I think Delphi XE is faster than Delphi 7, but should be more or less at the same level than Delphi 2007. Inlining was the main new feature, which could speed up a lot the code.

But for real-world application, speed increase won't be noticeable. About 10 or 20% in some specific cases, not more. Algorithms is always the key to better performance. Delphi 7 was already a nice compiler, handling dead code elimination, clever register use and peephole optimization.

For floating-point arithmetic, the Delphi compiler is nowadays deprecated. For instance, the current Delphi compiler is outperformed by latest Javascript engines using on the fly compilaton into SSE: you'll have to code SSE by hand for acceptable results. I hope that SSE code in the upcoming 64 bit compiler will change the results here. As far as appears in this comment, FPC 2.5.1 generated exe is already faster than Delphi (similar to TraceMonkey, at least), because it handles SSE. Nice for an Open Source compiler! :)


Comments and feedback are welcome in our forum!