To content | To menu | To search

Tag - sse42

Entries feed

2015, Tuesday June 30

Faster String process using SSE 4.2 Text Processing Instructions STTNI

A lot of our code, and probably yours, is highly relying on text process.
In our mORMot framework, most of its features use JSON text, encoded as UTF-8.
Profiling shows that a lot of time is spent computing the end of a text buffer, or comparing text content.

You may know that In its SSE4.2 feature set, Intel added STTNI (String and Text New Instructions) opcodes.
They are several new instructions that perform character searches and comparison on two operands of 16 bytes at a time.

I've just committed optimized version of StrComp() and StrLen(), also used for our TDynArrayHashed wrapper.
The patch works from Delphi 5 up to XE8, and with FPC - unknown SSE4.2 opcodes have been entered as hexadecimal bytes, for compatibility with the last century compilers!
The resulting speed up may be worth it!

Next logical step would be to use those instruction in the JSON process itself.
It may speed up the parsing speed of our core functions (which is already very optimized, but written in a classical one-char-at-a-time reading).
Main benefit would be to read the incoming UTF-8 text buffer by blocks of 16 bytes, and performing several characters comparison in a few CPU cycles, with no branching.
Also JSON writing would benefit for it, since escaping could be speed up thanks to STTNI instructions.

Any feedback is welcome, as usual!

2014, Sunday May 25

New crc32c() function using optimized asm and SSE 4.2 instruction

Cyclic Redundancy Check (CRC) codes are widely used for integrity checking of data in fields such as storage and networking.
There is an ever-increasing need for very high-speed CRC computations on processors for end-to-end integrity checks.

We just introduced to mORMot's core unit (SynCommons.pas) a fast and efficient crc32c() function.

It will use either:

  • Optimized x86 asm code, with unrolled loops;
  • SSE 4.2 hardware crc32 instruction, if available.

Resulting speed is very good.
This is for sure the fastest CRC function available in Delphi.
Note that there is a version dedicated to each Win32 and Win64 platform - both performs at the same speed!

In fact, most popular file formats and protocols (Ethernet, MPEG-2, ZIP, RAR, 7-Zip, GZip, and PNG) use the polynomial $04C11DB7, while Intel's hardware implementation is based on another polynomial, $1EDC6F41 (used in iSCSI and Btrfs).
So you would not use this new crc32c() function to replace the zlib's crc32() function, but as a convenient very fast hashing function at application level.
For instance, our TDynArray wrapper will use it for fast items hashing.

Continue reading...