Here is the resulting code, which should work from Delphi 7 up to 2009 (I don't have the 2010 sources, so I don't know if something was changed in the record RTTI with this version, but I guess not).
procedure _CopyRecord{ dest, source, typeInfo: Pointer };
asm // faster version by AB
{ -> EAX pointer to dest }
{ EDX pointer to source }
{ ECX pointer to typeInfo }
push ebp
push ebx
push esi
push edi
movzx ebx,byte ptr [ecx].TTypeInfo.Name[0]
mov esi,edx // esi = source
mov edi,eax // edi = dest
add ebx,ecx // ebx = TFieldTable
xor eax,eax // eax = current offset
mov ebp,[ebx].TFieldTable.Count // ebp = TFieldInfo count
mov ecx,[ebx].TFieldTable.Size
test ebp,ebp
jz @fullcopy
push ecx // sizeof(record) on stack
add ebx,offset TFieldTable.Fields[0] // ebx = first TFieldInfo
@next: mov ecx,[ebx].TFieldInfo.ValueOffset
mov edx,[ebx].TFieldInfo.TypeInfo
sub ecx,eax
mov edx,[edx]
jle @nomov
lea esi,esi+ecx
lea edi,edi+ecx
neg ecx
@mov1: mov al,[esi+ecx] // fast copy not destructable data
mov [edi+ecx],al
inc ecx
jnz @mov1
@nomov: mov eax,edi
movzx ecx,[edx] // data type
cmp ecx,tkLString
je @@LString
jb @@err
cmp ecx,tkDynArray
je @@DynArray
ja @@err
jmp dword ptr [ecx*4+@@tab-tkWString*4]
@@Tab: dd @@WString,@@Variant,@@Array,@@Record,@@Interface,@@err
@@errv: mov al,reVarInvalidOp
jmp @@err2
@@err: mov al,reInvalidPtr
@@err2: pop edi
pop esi
pop ebx
pop ebp
jmp Error
nop // all functions below have esi=source edi=dest
@@Array:
movzx ecx,byte ptr [edx].TTypeInfo.Name[0]
push dword ptr [edx+ecx].TFieldTable.Size
push dword ptr [edx+ecx].TFieldTable.Count
mov ecx,dword ptr [edx+ecx].TFieldTable.Fields[0]
mov ecx,[ecx]
mov edx,esi
call _CopyArray
pop eax // restore sizeof(Array)
jmp @@finish
@@Record:
movzx ecx,byte ptr [edx].TTypeInfo.Name[0]
mov ecx,[edx+ecx].TFieldTable.Size
push ecx
mov ecx,edx
mov edx,esi
call _CopyRecord
pop eax // restore sizeof(Record)
jmp @@finish
nop;nop;nop
@@Variant:
mov ecx,[VarCopyProc]
mov edx,esi
or ecx,ecx
jz @@errv
call ecx
mov eax,16
jmp @@finish
nop;nop;nop
@@Interface:
mov edx,[esi]
call _IntfCopy
jmp @@fin4
nop
@@DynArray:
mov ecx,edx // ecx=TypeInfo
mov edx,[esi]
call _DynArrayAsg
jmp @@fin4
@@WString:
{$ifndef LINUX}
mov edx,[esi]
call _WStrAsg
jmp @@fin4
nop;nop
{$endif}
@@LString:
mov edx,[esi]
call _LStrAsg
@@fin4: mov eax,4
@@finish:
add esi,eax
add edi,eax
add eax,[ebx].TFieldInfo.ValueOffset
dec ebp // any other TFieldInfo?
lea ebx,ebx+8 // next TFieldInfo
jnz @next
pop ecx // ecx= sizeof(record)
@fullcopy:
mov edx,edi
sub ecx,eax
mov eax,esi
jle @nomov2
call move
@nomov2:pop edi
pop esi
pop ebx
pop ebp
end;
I've tested this source code with some unit testing, and IMHO it works fine. Speed increase is noticeable. At least my code is much more readable than the original from Borland/Embarcadero, since I detailed the field names (TFieldInfo/TFieldData), and commented the source.
If you can guess if my inlined code in @mov1 is faster than a call move, please tell me!
The code and test function can be downloaded from http://synopse.info/files/CopyRecord.pas

15 reactions
1 From A. Bouchez - 23/03/2010, 23:28
Remark:
I had to rename TFieldInfo.Offset into TFieldInfo.ValueOffset in the system.pas unit, since TOffset is a special keyword for Delphi asm!
2 From gabr - 24/03/2010, 07:53
Did you test by calling this function directly or by recompiling system.dcu?
3 From A.Bouchez - 24/03/2010, 09:12
I've tested this by recompiling system.dcu. You can make an in-memory patch if don't want to recompile system.dcu: see the FastCode project for this patching technique.
I've updated the code, in order to make it 100% compatible with Delphi 2009. Available on the same link above.
4 From gabr - 26/03/2010, 17:15
Didn't yet have time to test your version. I can, however, confirm that _CopyRecord, _CopyObject, and _CopyArray are identical in D2009 and D2010.
5 From A.Bouchez - 26/03/2010, 17:39
I'll try it with D2009, and provide a "patch" version if you are interested for benchmarking.
Thanks for the feedback about the D2010 version. Good news!
6 From gabr - 26/03/2010, 19:31
'Patch' version would be great!
7 From A. Bouchez - 28/03/2010, 12:15
About the MacOSX stack alignment, that's a fact I knew about...
Please take a look at my answer in a separate post. Your comment is worth a new post in this blog! Thanks!
8 From AdamWu - 13/04/2010, 04:19
> If you can guess if my inlined code in @mov1 is faster than a call move, please tell me!
I think those four instructions can be replaced by a
rep movsb
As for the speed, since it is copying data byte-by-byte, for large piece of data, it will be slower than the move() function, but for small pieces it should be faster since it saves a function call. But the splitting size varies depending on the processor...
9 From A.Bouchez - 13/04/2010, 09:47
rep movsb is a very bad idea about speed, in modern CPUs. That is why I inlined the original rep movsb code with low level risc-aware opcodes like this:
lea esi,esi+ecx
lea edi,edi+ecx
neg ecx
@mov1: mov al,[esi+ecx] // fast copy not destructable data
mov [edi+ecx],al
inc ecx
jnz @mov1
is always faster than
rep movsb
on modern CPUs.
The decoding of rep movsb opcode by the CPU is slower than lower level byte by byte opcode, as I wrote them. The CPU is able to parallel the intruction copy with the second, whereas it is not able with the rep movsb instruction.
My only concern is about not to use the move procedure; but I think in most records, the little alignment overhead introduced by the move should not be worth it.
Take a look at http://www.intel.com/Assets/PDF/man...
You will discover that rep movsb with ecx>9 has a 50 cycle startup cost (page 60)... Nehalem architecture improves the performance by reducing this latency, but on older CPUs, rep movsb are to be avoided.
10 From AdamWu - 13/04/2010, 10:46
Well, there is also the "fast string" operation for larger piece of data - the processor moves 64 byte each iteration, which will then outperform the byte copying.
So I guess the undesirable sizes to use rep movsb are 9~75 bytes. I am not quite familiar with the Delphi RTTI, what is the data this code is trying to copy?
11 From AdamWu - 13/04/2010, 11:15
> I had to rename TFieldInfo.Offset into TFieldInfo.ValueOffset in the system.pas unit, since TOffset is a special keyword for Delphi asm!
Sorry for two posts in row, I was just trying compile your code.
You can use TFieldInfo.&Offset, the & tells the compiler the next word is an identifier, not a reserve word.
It works for 2009, and should work for 2010, but I am not sure about lower versions...
12 From A.Bouchez - 13/04/2010, 16:11
& is a good tip ! Thanks !
13 From A.Bouchez - 23/04/2010, 07:40
Thanks Adam for the feedback and these interesting benchmark. So I guess I should stay to the "move" version or the procedure.