Fast JPEG decoder using SSE/SSE2 version 1.2
By A.Bouchez on 2010, Wednesday March 24, 10:10 - Open Source libraries - Permalink
The Fast JPEG decoder using SSE/SSE2 library file has been updated, and is now in version 1.2, released under a MPL/GPL/LGPL tri-license. It's mainly a bug issue fix.
Here are the modifications made to the unit:
- resource leak in TJpegDecode.ToBitmap fixed (thanks Esmond
for the report)
- potential GPF issue fixed in TJpegDecode.Free
Source code is available from http://synopse.info/files/jpegdec.zip
Comments
Well, looks nice, but... it really outperforms only standart jpeg unit and OleLoadPicture API func.
Intel Jpeg Library, "good old" version 1.5 (dated 2001) is 2 times faster (I suppose that IJL uses integer/MMX mode, therefore operates with less data etc).
GdiPlus (Win7) - 1.5 times faster.
And what about the resulting quality?
Integer calculation is always faster, but visual quality is worse than with using floating point code, as the SSE2 do.
I also suspect that it does depend on the CPU it runs on (some old Intel or AMD have slow SSE/SSE2 implementation).
I didn't make a lot of benchmark, so if you've some, don't hesitate to post!
In all cases, thanks for the feedback. The jpegdec unit code size is smaller than jpeg unit.
About GdiPlus, see our implementation (with saving) at http://blog.synopse.info/post/2010/...
Yes, there is a difference in quality. I can't spot it visually, but subtracting images from each other I get noize pattern - with little noize for GDI+ and more for IJL.
Seems that your code uses jpeg "smoothing". IJL also can do this, but usually I disable it for performance reasons. With smooting enabled IJL produce the same image as GDI+ and works 20% slower than your code.
I'm not sure yet that this smoothing is really needed (as I can't see the difference)... may be, I'll try more images.
About the hardware - I've tested CoreDuo and Pentium3, with the same results.
Thanks for these details.
"Smoothing' is not an used feature in most softwares... you are perfectly right!
That's why I definitively will try to wrap the last IJG library into a pure Delphi unit, with SIMD extensions.
How about "progressive JPEG". I see - it`s can not be loaded?
Sorry, if I make mistakes on my English
I've tried to post a link to my test, but it was not passed (antispam defence?). Trying again without "http":
sapersky.narod.ru/files/JpegLoad.zip
Screenshots of decoded image (JpegDec seems to be more noizy):
IJL: i44.tinypic.com/30u4ccp.png
JpegDec: i40.tinypic.com/2na39lu.png
Thanks for the link of the test program. I've seen that you use our SynGdiPlus unit - I'm very proud of that!
It is very difficult to find the noise in the resulting decoded pictures. This is a very suggestive task...
Perhaps this SSE2 library was not so much attractive as it first looked out. Since I did find this asm code, I wanted to convert it into pure Delphi code.
Didn't you see the IJL component I posted on my personal web site some years ago?
- TJpegImage class compatible with Borland's jpeg.pas
- on Windows, use standard IJG .obj, or fastest IJL15.dll if available
- if IJL fails to load picture, uses standard IJG code
- on Linux, use native libjpeg.so, without any CLX dependency
http://bouchez.info/myjpeg.html
As I stated in this page, there is still some potential legal issue with using this old library from Intel. Some companies should not have the right to use this kind of SOUP in their software... See http://en.wikipedia.org/wiki/Soup_(...)
IJL really has a lot of issues. Another one (that I forgot first) is that registry key must be set for full performance. It's not always possible, common user usually have no permission to write to HKLM.
The test is updated (same link) to set up registry.
Also some people report problems with multithreading...
But, despite of all, it is the fastest freeware library for now.
Another perspective way of fast jpeg decoding (faster than any SIMD) is using GPU (CUDA, OpenCL...), but I haven't heard about freeware GPU decoders... well... stop... googled at last moment:
sourceforge.net/projects/cudajpegdecoder/
Although I can't even run binaries (special driver version from NVidia needed).