About the MacOSX stack alignment, that's a fact I knew about... but since this kind of code is very low-level, and depends on how the original EMB RTL is written and used, I'd better wait for the official EMB implementation before getting into tuning it.
For example, they may change the way a record is copied, by using inline code generated by the compiler at compilation type (this is possible, because the type of the record is already known and fixed at this time) instead of such slower _CopyRecord  real time type-depending routine. This would be the more efficient and elegant way of implementing it.

About cross platform, Free Pascal Compiler has a better approach than EMB, and is now much more advanced than the Delphi owner for that. IMHO the FPC approach of evolving the compiler is better: it doesn't force you to pay for a new version, and they maintain backward compatibility. I really find the EMB Unicode approach not worth it: from the VCL point of view, it was a need, but from the language and compiler point of view, it's a mess.
With FPC, the same code can be compiled (and sometimes cross-compiled) on Mac OSX, Win32, Win64, Linux 32 or 64, with all CPU available from last x86-64 down to tiny ARM7 CPU.... you can mix OS, CPU register length (32 or 64), even endianess... that is quite a challenge!

So here is my proposal: why couldn't Embarcadero people use the Free Pascal Compiler as their internal compiler? If EMB sells IDE, frameworks and support, why do they reinvent the existing wheel, since FPC is there, alive and working? Didn't they include the Oxygen compiler technology into their catalog? Why not including FPC into Delphi 2011? (and release it in 2010... not 2012...)

About cross-platform asm tuning, I think the right approach is PUREPASCAL. That is, whenever you want to code anything in asm, first code it in optimized pascal, then optimize it by hand if it's worth it (that is if your profiler tool identifies a code section to be a real bottleneck for your application). But always leave the original pascal code between {$IfDef PUREPASCAL} conditionals. In all cases, this original pascal code will perform well. For the fast pointer arithmetic adaptation which is needed in this kind of fast pascal code, some low level new types and defines (like PtrInt or PtrUInt for the CPU64) will adapt your code to whatever CPU it will run on. That's how we implemented it with our SQlite3 framework.

In most part of your software, optimized pascal code is the key of efficiency. If you want something fast, code it with pointers, and use the Aft-F2 keys to watch about the generated asm. For your software core, avoid using very high level functions (like generics or TList), and write tuned piece of code. In one word: know what you are doing (i.e. how it will compile and be understood by the CPU), know what for you are coding (i.e. what's the purpose of your code).

And don't speak about security and pointers. That's marketing. There are a lot of security issues with such huge framework as Java or DotNet. You can always use code injection or configuration overwrite, even on a "secure" virtual machine. Just make google researches, and be honest. Security is about code quality, best practices and algorithm. Security is not a "magic" included feature, as marketing people tell you.

Think about Algorithms and Data Structures, take a pen and a sheet of paper, write some design drawings, go take a coffee and/or run some miles with some good music on your headphone (as I like to), before going into your keyboard. You would be able to write much more efficient code.