The Challenge and the Algorithms
The original challenge is available at transit-lang-cmp with the original source code, of all those fancy languages and libraries.
In practice, the goal of this test program is to load two big CSVs into memory (80MB + 2MB), then serve over HTTP some JSON generated by route identifiers, joining both CSVs.
The resulting JSON could be of 30KB up to 2MB. And all data is generated on the fly from the CSV in memory.
To be fair, a regular/business coder would have used a database for this. Not silly memory structures. And asked for money to setup a huge set of cloud machines with load balancing.
Reference Implementations in Today Languages
The "modern" / "school" approach, as implemented in the reference project in Go/Rust/C#/... is using two lists for the CSVs data, then two maps/dictionaries between route ID and lists indexes.
- The Golang version has a good expressiveness, and is nice to read, even if you don't know the language.
- The C# version is also readable, but making a webserver is still confusing because it is not built from code, but from config files.
- Elixir is a bit over-complicated to my taste.
- Scala and TypeScript/Deno versions, are fine to read, but really slow. You may better use a database instead.
- Just for fun, check the Rust version - do you think Rust is good for big maintainable projects with junior developers?
There was a first attempt to write a FPC version of it, by Leledumbo.
His Source Code repository is a nice pascal conversion of above code. But performance was disappointing. Especially because the standard JSON library can not work directly with high level structures like collections or arrays.
So is Pascal out of the race?
Let's call the mORMot to the rescue!
Following the mORMot Way
For the mORMot version in FPC, I used another approach, with two diverse algorithms:
- I ensured the lists were sorted in memory, then made a O(log(n)) binary lookup in it;
- All stored strings were "interned", i.e. the same text was sharing a single string instance, and FPC reference counting did its magic.
There is no low-level tricks like generating the JSON by hand or using complex data structures - data structures are still are high-level, with readable field names and such. The logic and the intent are clearly readable.
We just leveraged the pascal language, and mORMot features. For instance, string interning is part of the framework, if needed.
Please check the source code in our repository.
As a result:
- Code is still readable, short and efficient (most of the process is done by mORMot, i.e. CSV, searching, JSON);
- It uses much less memory - 10 times less memory than Go when holding the data, 5 times less memory than Go when serving the data;
- Performance is as fast as Go, and its very tuned/optimized compiler and RTL.
Algorithms Matters
Main idea was to let the algorithms match the input data and the expected resultset.
As programmers do when programming games. Not as coders do when pissing out business software.
- The source code is still pretty readable, thanks to using mORMot efficient
TDynArray
to map the dynamic array storage, and its CSV and JSON abilities. - I guess source is still understandable for out-of-school programmers - much more readable than Rust for instance.
To by fair, I used typed pointers in TScheduler.BuildTripResponse
but it is not so hard getting their purpose, and FPC compiles this function into very efficient assembly. I could have used regular dynamic array access with indexes, it would have been slightly slower, but not really easier to follow, nor safer (if we compile with no range checking).
Worth noting that we did not make any specific tuning, like pre-allocating the results with constants, as other frameworks did. We just specified the data, then let mORMot play with it - that's all.
The mORMot RTTI level matches what we expect for modern frameworks: not only some classes to store JSON, but convenient serialization/unserialization using structures like class or record.
Using modern Pascal dynamic arrays and records to define the data structures let the compiler leverage the memory for us, with no need to write any try..finally..Free
blocks, and use interfaces. "Manual memory management" with Pascal is not mandatory and can easily be bypassed. Only for the WebServer, we have a Free
, which is expected to close it.
Give Me Some Numbers
Here are a performance comparison with Go (FPC on the left, Go on the right):
parsed 1790905 stop times in 968.43ms | parsed 1790905 stop times in 3.245251432s parsed 71091 trips in 39.54ms | parsed 71091 trips in 85.747852ms running (0m33.4s), 00/50 VUs, 348 complete and 0 interrupted | running (0m32.3s), 00/50 VUs, 320 complete and 0 interrupted default ✓ [======================================] 50 VUs 30 default ✓ [======================================] 50 VUs 30 data_received..................: 31 GB 933 MB/s | data_received..................: 31 GB 971 MB/s data_sent......................: 3.2 MB 97 kB/s | data_sent......................: 3.0 MB 92 kB/s http_req_blocked...............: avg=9µs min=1.09µs | http_req_blocked...............: avg=6.77µs min=1.09µs http_req_connecting............: avg=2.95µs min=0s | http_req_connecting............: avg=1.73µs min=0s http_req_duration..............: avg=47.59ms min=97.28µs | http_req_duration..............: avg=49.02ms min=123.81µ { expected_response:true }...: avg=47.59ms min=97.28µs | { expected_response:true }...: avg=49.02ms min=123.81µ http_req_failed................: 0.00% ✓ 0 ✗ | http_req_failed................: 0.00% ✓ 0 ✗ 3 http_req_receiving.............: avg=9.66ms min=15.35µs | http_req_receiving.............: avg=5.92ms min=14.76µs http_req_sending...............: avg=87.24µs min=5.2µs | http_req_sending...............: avg=70.71µs min=5.2µs http_req_tls_handshaking.......: avg=0s min=0s | http_req_tls_handshaking.......: avg=0s min=0s http_req_waiting...............: avg=37.83ms min=54.74µs | http_req_waiting...............: avg=43.02ms min=91.84µs http_reqs......................: 34452 1032.205528/s | http_reqs......................: 31680 981.949476/s iteration_duration.............: avg=4.72s min=3.54s | iteration_duration.............: avg=4.86s min=2.19s iterations.....................: 348 10.426318/s | iterations.....................: 320 9.918682/s vus............................: 30 min=30 ma | vus............................: 15 min=15 max vus_max........................: 50 min=50 ma | vus_max........................: 50 min=50 max
So CSV loading was much faster, then the HTTP server performance was almost the same.
No Alzheimer
Here are some numbers about memory consumption:
Upon finished loading the CSV, mORMot only eats 80MB, heck so little. Sounds a bit magical. But during load test, it fluctuates between 250-350MB, upon which it returns to 80MB at the end. The Go version eats 925MB upon finished loading the CSV. During load test, it tops at 1.5GB, returning to 925MB afterwards.
Nice to read. :)
Pascal has a Modern and Capable Ecosystem
This article was not only about Pascal, but about algorithms and libraries.
The challenge was initially about comparing them. Not only as unrealistic micro-benchmarks, or "computer language benchmark games", but as data processing abilities on a real usecase.
And... Pascal is still in the race for sure!
Not only for "old" people like me - I just got 50 years old. ;-)
The more we spread such kind of information, the less people would make jokes about pascal programmers.
Delphi and FPC are as old as Java, so it is time to get the big picture, not following marketing trends.