You could take a comparison with the Memory Manager embedded with the FreePascalCompiler.
It has also a per-thread heap, with another implementation design. And it is now pretty stable and cross-platform!
It uses some nice FPC compiler tricks, like a prefetch() function which is quite unique and powerful when dealing with such low-level stuff like a memory manager.
The FPC guys did great job at the compiler level, and they do not forget to optimize the RTL in their work, which is pretty reassuring for the future - do you follow my mind? :)

Mono-threaded tests

First of all, we run all our mORMot regression tests, which consists on a whole range of low-level and high-level tests, including stress tests and concurrent tests.

1. SapMM
Time elapsed for all tests: 27.98s
Peak Private Bytes = 233 408 KB
Virtual Size = 286 112 KB
Peak Working Set = 102 788 KB

2. FastMM4
Time elapsed for all tests: 27.45s
Peak Private Bytes = 39 668 KB
Virtual Size = 92 968 KB
Peak Working Set = 35 236 KB

So, a small speed penalty, but almost all those tests are mono-threaded, therefore it does make sense that FastMM4 is still the winner here.
Memory consumption of SapMM is much bigger, since during those tests, several thread pools were allocated and released, so SapMM did have to manage several heaps.

But SapMM is very close - congrats!

Multi-threading

Now, some data from our multi-thread tests.
We took the tests included in the above tests, but made it 10 times longer (2000 insertions+reads in the DB instead of 200).

It was run on a 4 cores i7 CPU, and compiled with Delphi XE4 targeting Win32 platform (the only one supported by SapMM by now):

1. SapMM
- Create thread pool: 1 assertion passed 2.77ms
- TSQLRestServerDB: 120,022 assertions passed 1.80s
1=38460/s 2=37022/s 5=35766/s 10=37032/s 30=36353/s 50=35737/s
- TSQLRestClientDB: 120,022 assertions passed 1.89s
1=35718/s 2=35437/s 5=33373/s 10=35740/s 30=33346/s 50=34342/s
- TSQLRestClientURINamedPipe: 60,011 assertions passed 3.04s
1=9621/s 2=10212/s 5=12503/s
- TSQLRestClientURIMessage: 80,002 assertions passed 1.91s
1=19494/s 2=22776/s 5=23490/s 10=20981/s
- TSQLHttpClientWinHTTP_HTTPAPI: 119,861 assertions passed 7.63s
1=5156/s 2=7578/s 5=8701/s 10=9314/s 30=9507/s 50=10730/s
- TSQLHttpClientWinSock_WinSock: 119,971 assertions passed 4.74s
1=10990/s 2=12606/s 5=13702/s 10=13840/s 30=14196/s 50=14210/s
Total failed: 0 / 619,890 - Multi thread process PASSED 21.05s Peak Private Bytes = 198 628 KB Virtual Size = 248 160 KB Peak Working Set = 85 328 KB
2. FastMM4
- Create thread pool: 1 assertion passed 4.44ms
- TSQLRestServerDB: 120,022 assertions passed 2.21s
1=40017/s 2=25030/s 5=31335/s 10=24284/s 30=31308/s 50=28787/s
- TSQLRestClientDB: 120,022 assertions passed 2.23s
1=35795/s 2=34538/s 5=31646/s 10=25018/s 30=27057/s 50=24016/s
- TSQLRestClientURINamedPipe: 60,012 assertions passed 3.20s
1=9607/s 2=10242/s 5=10418/s
- TSQLRestClientURIMessage: 80,004 assertions passed 2.98s
1=18035/s 2=19944/s 5=11601/s 10=9950/s
- TSQLHttpClientWinHTTP_HTTPAPI: 119,970 assertions passed 8.02s
1=5156/s 2=7249/s 5=8265/s 10=8135/s 30=8983/s 50=10190/s
- TSQLHttpClientWinSock_WinSock: 119,995 assertions passed 5.76s
1=10814/s 2=10875/s 5=12501/s 10=11365/s 30=12040/s 50=8079/s
Total failed: 0 / 620,026 - Multi thread process PASSED 24.42s Peak Private Bytes = 23 956 KB Virtual Size = 79 388 KB Peak Working Set = 20 356 KB

Of course, due to its per-thread heap, SapMM eats much more memory than FastMM4, even with a thread pool like in those tests.
In the above tests, each protocol create up to 50 threads for its concurrent client access - and both HTTP servers (http.sys or WinSock) do also use a thread pool to handle its clients.
So memory grows a lot. SapMM consumes more than 4 times the same amount of memory than FastMM4. This is worth considering!
It may be even worse if not thread pool were used, but a new thread created by client request (e.g. with a DataSnap/Indy server).

Conclusion: it depends on what you need!

For the main mORMot use case (which is to serve HTTP content, preferred using http.sys), it is only 7.63 seconds for SapMM, and 8.02 seconds for FastMM4.
So we would perhaps stick to FastMM4 here, since memory allocation is not the bottleneck.
For direct server-side process (i.e. TSQLRestServerDB and TSQLRestClientDB), SapMM does an amazing job, and let concurrent access be linear, whereas FastMM4 performance was almost cut in half with 50 client threads.

Apologizes for false positive alert

In fact, those tests helped to (re) identify an issue in our own test code of weak references.
We first thought it was SapMM's fault, but it was not.
In fact, when you use manual weak references - pointer(@fMyInterfaceField) := pointer(aInterface) - you must set it to zero at destroying - pointer(@fMyInterfaceField) := nil - otherwise you may get an access violation, which was the case with SapMM and with FastMM4 in fulldebug mode (whereas default mode was tolerant enough to continue the tests).

Some months ago (probably after a long night of work), we already identified the error, but, by laziness, we just disabled the test when fulldebugg mode was on...
:(

Shame on us!
Never disable a test which does not pass.
Never!

Our test case is fixed now.
There was no problem on our implementation of zeroing weak references - just an incorrect test case implementation.
Sorry Anton for the misunderstanding on our side!