Here is the resulting code, which should work from Delphi 7 up to 2009 (I don't have the 2010 sources, so I don't know if something was changed in the record RTTI with this version, but I guess not).
procedure _CopyRecord{ dest, source, typeInfo: Pointer }; asm // faster version by AB { -> EAX pointer to dest } { EDX pointer to source } { ECX pointer to typeInfo } push ebp push ebx push esi push edi movzx ebx,byte ptr [ecx].TTypeInfo.Name[0] mov esi,edx // esi = source mov edi,eax // edi = dest add ebx,ecx // ebx = TFieldTable xor eax,eax // eax = current offset mov ebp,[ebx].TFieldTable.Count // ebp = TFieldInfo count mov ecx,[ebx].TFieldTable.Size test ebp,ebp jz @fullcopy push ecx // sizeof(record) on stack add ebx,offset TFieldTable.Fields[0] // ebx = first TFieldInfo @next: mov ecx,[ebx].TFieldInfo.ValueOffset mov edx,[ebx].TFieldInfo.TypeInfo sub ecx,eax mov edx,[edx] jle @nomov lea esi,esi+ecx lea edi,edi+ecx neg ecx @mov1: mov al,[esi+ecx] // fast copy not destructable data mov [edi+ecx],al inc ecx jnz @mov1 @nomov: mov eax,edi movzx ecx,[edx] // data type cmp ecx,tkLString je @@LString jb @@err cmp ecx,tkDynArray je @@DynArray ja @@err jmp dword ptr [ecx*4+@@tab-tkWString*4] @@Tab: dd @@WString,@@Variant,@@Array,@@Record,@@Interface,@@err @@errv: mov al,reVarInvalidOp jmp @@err2 @@err: mov al,reInvalidPtr @@err2: pop edi pop esi pop ebx pop ebp jmp Error nop // all functions below have esi=source edi=dest @@Array: movzx ecx,byte ptr [edx].TTypeInfo.Name[0] push dword ptr [edx+ecx].TFieldTable.Size push dword ptr [edx+ecx].TFieldTable.Count mov ecx,dword ptr [edx+ecx].TFieldTable.Fields[0] mov ecx,[ecx] mov edx,esi call _CopyArray pop eax // restore sizeof(Array) jmp @@finish @@Record: movzx ecx,byte ptr [edx].TTypeInfo.Name[0] mov ecx,[edx+ecx].TFieldTable.Size push ecx mov ecx,edx mov edx,esi call _CopyRecord pop eax // restore sizeof(Record) jmp @@finish nop;nop;nop @@Variant: mov ecx,[VarCopyProc] mov edx,esi or ecx,ecx jz @@errv call ecx mov eax,16 jmp @@finish nop;nop;nop @@Interface: mov edx,[esi] call _IntfCopy jmp @@fin4 nop @@DynArray: mov ecx,edx // ecx=TypeInfo mov edx,[esi] call _DynArrayAsg jmp @@fin4 @@WString: {$ifndef LINUX} mov edx,[esi] call _WStrAsg jmp @@fin4 nop;nop {$endif} @@LString: mov edx,[esi] call _LStrAsg @@fin4: mov eax,4 @@finish: add esi,eax add edi,eax add eax,[ebx].TFieldInfo.ValueOffset dec ebp // any other TFieldInfo? lea ebx,ebx+8 // next TFieldInfo jnz @next pop ecx // ecx= sizeof(record) @fullcopy: mov edx,edi sub ecx,eax mov eax,esi jle @nomov2 call move @nomov2:pop edi pop esi pop ebx pop ebp end;
I've tested this source code with some unit testing, and IMHO it works fine. Speed increase is noticeable. At least my code is much more readable than the original from Borland/Embarcadero, since I detailed the field names (TFieldInfo/TFieldData), and commented the source.
If you can guess if my inlined code in @mov1 is faster than a call move, please tell me!
The code and test function can be downloaded from http://synopse.info/files/CopyRecord.pas
15 reactions
1 From A. Bouchez - 23/03/2010, 23:28
Remark:
I had to rename TFieldInfo.Offset into TFieldInfo.ValueOffset in the system.pas unit, since TOffset is a special keyword for Delphi asm!
2 From gabr - 24/03/2010, 07:53
Did you test by calling this function directly or by recompiling system.dcu?
3 From A.Bouchez - 24/03/2010, 09:12
I've tested this by recompiling system.dcu. You can make an in-memory patch if don't want to recompile system.dcu: see the FastCode project for this patching technique.
I've updated the code, in order to make it 100% compatible with Delphi 2009. Available on the same link above.
4 From gabr - 26/03/2010, 17:15
Didn't yet have time to test your version. I can, however, confirm that _CopyRecord, _CopyObject, and _CopyArray are identical in D2009 and D2010.
5 From A.Bouchez - 26/03/2010, 17:39
I'll try it with D2009, and provide a "patch" version if you are interested for benchmarking.
Thanks for the feedback about the D2010 version. Good news!
6 From gabr - 26/03/2010, 19:31
'Patch' version would be great!
7 From A. Bouchez - 28/03/2010, 12:15
About the MacOSX stack alignment, that's a fact I knew about...
Please take a look at my answer in a separate post. Your comment is worth a new post in this blog! Thanks!
8 From AdamWu - 13/04/2010, 04:19
> If you can guess if my inlined code in @mov1 is faster than a call move, please tell me!
I think those four instructions can be replaced by a
rep movsb
As for the speed, since it is copying data byte-by-byte, for large piece of data, it will be slower than the move() function, but for small pieces it should be faster since it saves a function call. But the splitting size varies depending on the processor...
9 From A.Bouchez - 13/04/2010, 09:47
rep movsb is a very bad idea about speed, in modern CPUs. That is why I inlined the original rep movsb code with low level risc-aware opcodes like this:
lea esi,esi+ecx
lea edi,edi+ecx
neg ecx
@mov1: mov al,[esi+ecx] // fast copy not destructable data
mov [edi+ecx],al
inc ecx
jnz @mov1
is always faster than
rep movsb
on modern CPUs.
The decoding of rep movsb opcode by the CPU is slower than lower level byte by byte opcode, as I wrote them. The CPU is able to parallel the intruction copy with the second, whereas it is not able with the rep movsb instruction.
My only concern is about not to use the move procedure; but I think in most records, the little alignment overhead introduced by the move should not be worth it.
Take a look at http://www.intel.com/Assets/PDF/man...
You will discover that rep movsb with ecx>9 has a 50 cycle startup cost (page 60)... Nehalem architecture improves the performance by reducing this latency, but on older CPUs, rep movsb are to be avoided.
10 From AdamWu - 13/04/2010, 10:46
Well, there is also the "fast string" operation for larger piece of data - the processor moves 64 byte each iteration, which will then outperform the byte copying.
So I guess the undesirable sizes to use rep movsb are 9~75 bytes. I am not quite familiar with the Delphi RTTI, what is the data this code is trying to copy?
11 From AdamWu - 13/04/2010, 11:15
> I had to rename TFieldInfo.Offset into TFieldInfo.ValueOffset in the system.pas unit, since TOffset is a special keyword for Delphi asm!
Sorry for two posts in row, I was just trying compile your code.
You can use TFieldInfo.&Offset, the & tells the compiler the next word is an identifier, not a reserve word.
It works for 2009, and should work for 2010, but I am not sure about lower versions...
12 From A.Bouchez - 13/04/2010, 16:11
& is a good tip ! Thanks !
13 From A.Bouchez - 23/04/2010, 07:40
Thanks Adam for the feedback and these interesting benchmark. So I guess I should stay to the "move" version or the procedure.