As we already stated here, the Delphi compiler for the Win64 target performs
well, as soon as you by-pass the RTL and its sub-optimized implementation -
as we do for mORMot.
In fact, our huge set of regression tests perform only 10%
slower on Win64, when compared to Win32.
But we got access to much more memory - which is not a huge gain for a
mORMot server, which uses very little of RAM - so may be useful in
some cases, when you need a lot of structures to be loaded in your RAM.
Slowdown on Win64 is mostly due to biggest pointer size, which will use
twice the memory, hence may generate a larger number of cache misses (failed
attempts to read or write a piece of data in the cache, which results in a main
memory access with much longer latency).
But in Delphi, apart from the RTL which may need more tuning about performance
(but seems not to be a priority
on Embarcadero side), is also sometimes less efficient when generating the
code.
For instance, sounds like if case ... of ... end
statements do not
generated branch table
instructions on Win64, whereas it does for Win32 - and FPC does for any x64
platform it supports.