We implement 64-bit floating point (double) to ASCII conversion using the  GRISU-1 efficient algorithm. This clever and very efficient algorithm was detailed in 2009 by Florian Loitsch, and is a standard of reference for this particular process.

Encoding integers into text is pretty straightforward. But encoding doubles is a real P...A - the IEEE standard is quite complex.

We extracted a double-to-ascii only cut-down version of flt_core.inc flt_conv.inc flt_pack.inc files from FPC RTL, which implemented this algorithm.
As usual, we made a huge refactoring to reach the best performance, especially tuning the Intel target, with some dedicated asm and code rewrite.

Some information and numbers extracted from the new source code comments:

  With Delphi 10.3 on Win32: (no benefit)
100000 FloatToText in 38.11ms i.e. 2,623,570/s, aver. 0us, 47.5 MB/s
100000 str in 43.19ms i.e. 2,315,082/s, aver. 0us, 50.7 MB/s
100000 DoubleToShort in 45.50ms i.e. 2,197,367/s, aver. 0us, 43.8 MB/s
100000 DoubleToAscii in 42.44ms i.e. 2,356,045/s, aver. 0us, 47.8 MB/s
With Delphi 10.3 on Win64:
100000 FloatToText in 61.83ms i.e. 1,617,233/s, aver. 0us, 29.3 MB/s
100000 str in 53.20ms i.e. 1,879,663/s, aver. 0us, 41.2 MB/s
100000 DoubleToShort in 18.45ms i.e. 5,417,998/s, aver. 0us, 108 MB/s
100000 DoubleToAscii in 18.19ms i.e. 5,496,921/s, aver. 0us, 111.5 MB/s
With FPC on Win32:
100000 FloatToText in 115.62ms i.e. 864,842/s, aver. 1us, 15.6 MB/s
100000 str in 57.30ms i.e. 1,745,109/s, aver. 0us, 39.9 MB/s
100000 DoubleToShort in 23.88ms i.e. 4,187,078/s, aver. 0us, 83.5 MB/s
100000 DoubleToAscii in 23.34ms i.e. 4,284,490/s, aver. 0us, 86.9 MB/s
With FPC on Win64:
100000 FloatToText in 76.92ms i.e. 1,300,052/s, aver. 0us, 23.5 MB/s
100000 str in 27.70ms i.e. 3,609,456/s, aver. 0us, 82.6 MB/s
100000 DoubleToShort in 14.73ms i.e. 6,787,944/s, aver. 0us, 135.4 MB/s
100000 DoubleToAscii in 13.78ms i.e. 7,253,735/s, aver. 0us, 147.2 MB/s
With FPC on Linux x86_64:
100000 FloatToText in 98.47ms i.e. 1,015,465/s, aver. 0us, 18.4 MB/s
100000 str in 38.14ms i.e. 2,621,369/s, aver. 0us, 60 MB/s
100000 DoubleToShort in 14.77ms i.e. 6,766,357/s, aver. 0us, 134.9 MB/s
100000 DoubleToAscii in 13.79ms i.e. 7,248,477/s, aver. 0us, 147.1 MB/s

As you can see:

  • Our rewrite is twice faster than original flt_conv.inc from FPC RTL (str)
  • Delphi Win32 has trouble making 64-bit computation - no benefit since it has good optimized i87 asm (but slower than our code with FPC/Win32)
  • FPC is more efficient when compiling integer arithmetic; we avoided slow division by calling our Div100(), but Delphi Win64 is still far behind
  • Delphi Win64 has very slow FloatToText and str() implementation (in pure pascal) - so our new version is welcome.
In a nutshell, this routine is now used on all platform (even ARM and AARCH64), with the exception of Delphi Win32, in which the built-in x87 asm is a bit faster, mainly due to performance problems of the Delphi compiler when handling 64-bit logical and arithmetic process on the i386 CPU.
You can check the source code of our implementation of Grisu. You may find some nice performance tricks.
And any feedback is welcome in our forum, as usual!