The Challenge and the Algorithms

The original challenge is available at transit-lang-cmp with the original source code, of all those fancy languages and libraries.

In practice, the goal of this test program is to load two big CSVs into memory (80MB + 2MB), then serve over HTTP some JSON generated by route identifiers, joining both CSVs.
The resulting JSON could be of 30KB up to 2MB. And all data is generated on the fly from the CSV in memory.

To be fair, a regular/business coder would have used a database for this. Not silly memory structures. And asked for money to setup a huge set of cloud machines with load balancing. :-)

Reference Implementations in Today Languages

The "modern" / "school" approach, as implemented in the reference project in Go/Rust/C#/... is using two lists for the CSVs data, then two maps/dictionaries between route ID and lists indexes.

  • The Golang version has a good expressiveness, and is nice to read, even if you don't know the language.
  • The C# version is also readable, but making a webserver is still confusing because it is not built from code, but from config files.
  • Elixir is a bit over-complicated to my taste.
  • Scala and TypeScript/Deno versions, are fine to read, but really slow. You may better use a database instead.
  • Just for fun, check the Rust version - do you think Rust is good for big maintainable projects with junior developers?

There was a first attempt to write a FPC version of it, by Leledumbo.
His Source Code repository is a nice pascal conversion of above code. But performance was disappointing. Especially because the standard JSON library can not work directly with high level structures like collections or arrays.

So is Pascal out of the race?
Let's call the mORMot to the rescue!

Following the mORMot Way

For the mORMot version in FPC, I used another approach, with two diverse algorithms:

  • I ensured the lists were sorted in memory, then made a O(log(n)) binary lookup in it;
  • All stored strings were "interned", i.e. the same text was sharing a single string instance, and FPC reference counting did its magic.

There is no low-level tricks like generating the JSON by hand or using complex data structures - data structures are still are high-level, with readable field names and such. The logic and the intent are clearly readable.
We just leveraged the pascal language, and mORMot features. For instance, string interning is part of the framework, if needed.

Please check the source code in our repository.

As a result:

  • Code is still readable, short and efficient (most of the process is done by mORMot, i.e. CSV, searching, JSON);
  • It uses much less memory - 10 times less memory than Go when holding the data, 5 times less memory than Go when serving the data;
  • Performance is as fast as Go, and its very tuned/optimized compiler and RTL.

Algorithms Matters

Main idea was to let the algorithms match the input data and the expected resultset.
As programmers do when programming games. Not as coders do when pissing out business software. ;-)

  • The source code is still pretty readable, thanks to using mORMot efficient TDynArray to map the dynamic array storage, and its CSV and JSON abilities.
  • I guess source is still understandable for out-of-school programmers - much more readable than Rust for instance.

To by fair, I used typed pointers in TScheduler.BuildTripResponse but it is not so hard getting their purpose, and FPC compiles this function into very efficient assembly. I could have used regular dynamic array access with indexes, it would have been slightly slower, but not really easier to follow, nor safer (if we compile with no range checking).

Worth noting that we did not make any specific tuning, like pre-allocating the results with constants, as other frameworks did. We just specified the data, then let mORMot play with it - that's all.
The mORMot RTTI level matches what we expect for modern frameworks: not only some classes to store JSON, but convenient serialization/unserialization using structures like class or record.
Using modern Pascal dynamic arrays and records to define the data structures let the compiler leverage the memory for us, with no need to write any try..finally..Free blocks, and use interfaces. "Manual memory management" with Pascal is not mandatory and can easily be bypassed. Only for the WebServer, we have a Free, which is expected to close it.

Give Me Some Numbers

Here are a performance comparison with Go (FPC on the left, Go on the right):

parsed 1790905 stop times in 968.43ms                         | parsed 1790905 stop times in 3.245251432s
parsed 71091 trips in 39.54ms                                 | parsed 71091 trips in 85.747852ms

running (0m33.4s), 00/50 VUs, 348 complete and 0 interrupted  | running (0m32.3s), 00/50 VUs, 320 complete and 0 interrupted
default ✓ [======================================] 50 VUs  30   default ✓ [======================================] 50 VUs  30

     data_received..................: 31 GB  933 MB/s         |      data_received..................: 31 GB  971 MB/s
     data_sent......................: 3.2 MB 97 kB/s          |      data_sent......................: 3.0 MB 92 kB/s
     http_req_blocked...............: avg=9µs     min=1.09µs  |      http_req_blocked...............: avg=6.77µs  min=1.09µs
     http_req_connecting............: avg=2.95µs  min=0s      |      http_req_connecting............: avg=1.73µs  min=0s     
     http_req_duration..............: avg=47.59ms min=97.28µs |      http_req_duration..............: avg=49.02ms min=123.81µ
       { expected_response:true }...: avg=47.59ms min=97.28µs |        { expected_response:true }...: avg=49.02ms min=123.81µ
     http_req_failed................: 0.00%  ✓ 0           ✗  |      http_req_failed................: 0.00%  ✓ 0          ✗ 3
     http_req_receiving.............: avg=9.66ms  min=15.35µs |      http_req_receiving.............: avg=5.92ms  min=14.76µs
     http_req_sending...............: avg=87.24µs min=5.2µs   |      http_req_sending...............: avg=70.71µs min=5.2µs 
     http_req_tls_handshaking.......: avg=0s      min=0s      |      http_req_tls_handshaking.......: avg=0s      min=0s     
     http_req_waiting...............: avg=37.83ms min=54.74µs |      http_req_waiting...............: avg=43.02ms min=91.84µs
     http_reqs......................: 34452  1032.205528/s    |      http_reqs......................: 31680  981.949476/s
     iteration_duration.............: avg=4.72s   min=3.54s   |      iteration_duration.............: avg=4.86s   min=2.19s 
     iterations.....................: 348    10.426318/s      |      iterations.....................: 320    9.918682/s
     vus............................: 30     min=30        ma |      vus............................: 15     min=15       max
     vus_max........................: 50     min=50        ma |      vus_max........................: 50     min=50       max

So CSV loading was much faster, then the HTTP server performance was almost the same.

No Alzheimer

Here are some numbers about memory consumption:

Upon finished loading the CSV, mORMot only eats 80MB, heck so little. Sounds a bit magical. But during load test, it fluctuates between 250-350MB, upon which it returns to 80MB at the end. The Go version eats 925MB upon finished loading the CSV. During load test, it tops at 1.5GB, returning to 925MB afterwards.

Nice to read. :)

Pascal has a Modern and Capable Ecosystem

This article was not only about Pascal, but about algorithms and libraries.
The challenge was initially about comparing them. Not only as unrealistic micro-benchmarks, or "computer language benchmark games", but as data processing abilities on a real usecase.

And... Pascal is still in the race for sure!
Not only for "old" people like me - I just got 50 years old. ;-)

The more we spread such kind of information, the less people would make jokes about pascal programmers.
Delphi and FPC are as old as Java, so it is time to get the big picture, not following marketing trends.