Numbers are talking:

  • under Win32, with a Core i7 CPU: pure pascal: 152ms - x86: 112ms
  • under Win64, with a Core i7 CPU: pure pascal: 202ms - SSE4: 78ms

When executing the following test code:

for i := 1 to 100000 do begin
  s := SHA256('123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890');
  assert(s='f816ca413da6f2881c0cf16cb6d5bbc5d4189f5a9f185855c8bfd6423e099e52');
end;

Your feedback is welcome, as usual!