Most of the hack was to include the SSE/SSE2 assembly code into a true Delphi unit. Conversion was made difficult because the Delphi compiler doesn't allow to align code or data at 16 bytes boundaries, which is required by the SSE/SSE2 operations. This is a well known limitation of the Delphi compiler. See http://qc.embarcadero.com/wc/qcmain.aspx?d=1116
A solution was found by copying the whole used tables into memory-allocated buffer, and by creating the TBL_64 table by code. See JpegDecode() function for the hack.
Some modifications was made to the original code, in order to use these tables from memory, and to allow floating point usage (by adding the emms operation) after decode.
Since we use Win32 VirtualAlloc API for memory allocation (which is always 16 bytes aligned and set to zero, as expected by the code), the TJpegDecode object instance has a not-common creator, as the JpegDecode() function: don't try allocate any TJpegDecode object on the stack or via Delphi heap, but use this JpegDecode() function to allocate a PJpegDecode pointer, which will be freed by its Free method.
There is no TPicture descendent implementation yet, since it should be more usefull to use a resulting TBitmap in your code.
Direct access to the picture bitmap without creating any TBitmap resource is allowed via the TJpegDecode.DrawTo() methods: so you can use very big pictures, without any resource limitations (under Win 2K or XP, allocating big TBitmap instances raises errors)
Decoding is not thread safe by now; if you really need it, please ask.
There is no save method in this unit: it's a fast decoder, not an encoder.
Tested under Delphi 7 and 2009. Should need at least Delphi 7, for the SSE2 instruction set in the asm part.
Quick start:
function JpegDecode(Buffer: pointer; BufferLen: integer): TBitmap; overload;
function JpegDecode(Buffer: pointer; BufferLen: cardinal; var pImg: PJpegDecode): TJpegDecodeError; overload;
procedure JpegDraw(Buffer: pointer; BufferLen: integer; Canvas: TCanvas; X,Y: integer); var pImg: PJpegDecode; begin if JpegDecode(Buffer,BufferLen,pImg)=JPEG_SUCCESS then try pImg^.DrawTo(Canvas,X,Y); finally pImg^.Free; end; end;
18 reactions
1 From Mikel - 18/03/2010, 11:24
There is a memory leakage when JpegDecode can't decompress JPEG file. I use this code:
var Bmp: TBitmap;
pImg: PJpegDecode;
res:TJpegDecodeError;
begin
with TMemoryStream.Create do
try
LoadFromFile('..\123.JPG');
res:=JpegDecode(Memory,Size,pImg);
try
if res=JPEG_SUCCESS
then try
Image1.Picture.Bitmap.Assign(pImg^.ToBitmap);
finally
Bmp.Free;
end;
finally
pImg^.Free;
end;
finally
Free;
end;
end;
I may send you jpeg file which i've tried to decompress. Standart JPEGImage opens this file, but JpegDecode returns JPEG_FORMATNOTSUPPORTED, pImg after JpegDecode exec is nil.
2 From A. Bouchez - 18/03/2010, 15:45
I've update the library file, now in version 1.1.
1. Now the library is licensed under a MPL/GPL/LGPL tri-license (not only LGPL);
2. Memory leak has been fixed in this version;
3. It was indeed identified that this decoder is not able to decode some kind of jpeg files; comes from original jpegdec code, not our Delphi conversion; in such cases, the original libjpeg library (i.e. default jpeg unit) must be used instead.
If I've time these days, I'll try to adapt the x86 SIMD extension for IJG JPEG library version (from MIYASAKA Masaru- see http://cetus.sakura.ne.jp/softlab/j...) into a pure Delphi unit. This version seems more compatible, but a bit more tricky to adapt as a Delphi unit. Stay tuned!
3 From dwrbudr - 19/03/2010, 09:22
I can provide you at least 50 jpeg files which this library cannot open. If you can convert JPEG SIMD library that would be great.
Is it possible to load jpeg using Fast IDCT scale method to 1/2, 1/4 and 1/8 for faster preview loading? Look at this:
http://jpegclub.org/djpeg/
4 From A.Bouchez - 19/03/2010, 10:32
I also could need the 1/2 1/4 1/8 scaling methods.
Your link is very interesting... the MIYASAKA Masaru version was based on the release 6b of the library, which allows only 1/2 1/4 1/8 scaling, not 1/n scaling (introduced in the IJG 8a version).
I'll first try to compile and make a standard unit with IJG 8a. Then a SIMD enabled based on 6b. But I don't think I could be able to make a SIMD enabled version based on 8a...
5 From Ritsaet Hornstra - 20/03/2010, 16:44
You might try the following base class for the main class in your unit. It will automatically align the first field in the class to a 16 byte boundary (or something else) and you won't need the cluncky alloc / dealloc routines (and making the code more compatible)
type
TAlignedObject = class( TObject )
private
class function GetAlignValue (): Integer; virtual;
class function GetDelta (): Integer; virtual;
public
class function NewInstance (): TObject; override;
procedure FreeInstance; override;
end;
{ TAlignedObject }
procedure TAlignedObject.FreeInstance();
var
P: PInteger;
begin
CleanupInstance();
P := PInteger( Self );
Dec( P );
while P^ = 0 do Dec( P );
FreeMem( P );
end;
class function TAlignedObject.GetAlignValue(): Integer;
begin
// Default to SSE alignment
Result := 16;
end;
class function TAlignedObject.GetDelta(): Integer;
begin
// Make sure that the END of our class is at the aligned boundary so that when a user add fields there are alligned
// to the GetAlignValue.
Result := -TAlignedObject.InstanceSize();
end;
class function TAlignedObject.NewInstance(): TObject;
var
P: PInteger;
N, X: Integer;
begin
N := GetAlignValue();
GetMem( P, InstanceSize() + N );
P^ := $12345678; // MAGIC
Inc( P );
Dec( N );
X := GetDelta() and N;
while ( Integer( P ) and N <> X ) do begin
P^ := 0;
Inc( P );
end;
Result := InitInstance( P );
end;
type
TBase = class( TAlignedObject )
private
// Here we are 16 byte aligned.
C: array [ 0..63 ] of Single;
S: array [ 0..$7 ] of Single;
X: array [ 0..$F ] of Byte;
D: array [ 0..$7 ] of Byte;
public
constructor Create();
procedure Test();
end;
{ TBase }
constructor TBase.Create();
var
i: Integer;
begin
for i := 0 to 7 do S[ i ] := ( i - 4 ) * 50;
for i := 0 to 15 do X[ i ] := 128;
for i := 0 to 7 do D[ i ] := 0;
Test();
for i := 0 to 7 do D[ i ] := D[ i ] - 128;
end;
procedure TBase.Test();
asm // self = eax.
//movaps xmm7, dqword ptr [ eax + X ] // xmm7 = 128*16 (U8)
movaps xmm0, dqword ptr [ eax + S ] // xmm0 = S0..S3 (Single)
movaps xmm1, dqword ptr [ eax + S + $10 ] // xmm1 = S4..S7 (Single)
cvtps2dq xmm0, xmm0 // xmm0 = S0..S3 (S32)
cvtps2dq xmm1, xmm1 // xmm0 = S0..S3 (S32)
packssdw xmm0, xmm1 // xmm0 = S0..S7 (S16)
packsswb xmm0, xmm0 // xmm0 = S0..S7, S0..S7 (S8)
xorps xmm0, dqword ptr [ eax + X ] // xmm0 = S0..S7, S0..S7 (U8); + 128 for each byte, S8->U8
movlps qword ptr [ eax + D ], xmm0 // Store S0..S7
end;
// test..
procedure TForm7.FormCreate(Sender: TObject);
var
P: TBase;
begin
P := TBase.Create();
try
finally
P.Free();
end;
end;
6 From Ritsaert Hornstra - 22/03/2010, 07:54
Monsieur Bouchez,
If Embarcadero would add an align to 16 bytes that would be best. Until then I think you could try the following:
1. Create an object with all the predefined tables in the initialization (eg expand some data part a la your current TBL, here you could store the value of the CPUID call (eg function pointers etc etc, anything static). Especially with small JPEG files you would save some substantial overhead.
2. For each JPEG decode instance allocate an an object and place code in parameterless members (the class instance reference. All parameters are members in your class. This way you can use all member variables and only use a single register to address them (eax).
3. Give access to the raw decoded bytes (and support other formats than 32 bit RGBx).
What I currently miss in your code is what goes in and out of each routine (which registers) and a description of what constant / data goes where in the 10K structure. I was unable to see the effect of creating a gray float to gray byte conversion routine as a test to get into the code.
NB: What you say that only the first field is aligned: yes that is thue, but also true for the VirtualAlloc variant. The virtual alloc variant over uses memory (per $1000 block)
NB: Allocations will always go right, but the aligning code (with zero and magic fill) assume an alignment of 4 bytes which the memory manager garantuees.
The SSE2 iDCT routines are extremely promising!
7 From A.Bouchez - 22/03/2010, 14:12
1. In practice, I don't think there is any overhead by copying the TBL const into the allocated memory. This operation is immediate (just a move), and time spent in the "offset reallocation" loop is not noticeable. Even the VirtualAlloc overhead you spoke about (4KB) seems not relevant because there are only two allocation calls, the first alloc is about 10KB in size (for the whole TJpegDecode internal buffer) and the 2nd alloc is for the RGB buffer.
2. This could not be possible because of the source nasm code I was using is not oriented as such (see below).
3. You have access to the raw decoded bytes, via the pRGB pointer in the TJpegDecode structure. See usage in ToBitmap() and DrawTo() methods.
This unit is a raw port of disassembled .obj files of the original routines, and not the nasm original code: Delphi asm is not poweful enough to assemble these nasm files. So you are perfectly right about your difficulties about guessing "what goes in and out of each routine (which registers) and a description of what constant / data goes where in the 10K structure". You would have to go back to the original nasm code in order to find out what.
Converting the exact nasm source code into pure delphi asm is possible, but would require much much more work.
Thanks for your very interesting comments!
8 From Ritsaert Hornstra - 22/03/2010, 20:30
Thank you for youw answers. At the moment I am planning to introduce this code to our development team (we are in dire need of high quality and fast JPEG decode routines for grayscale images and color images decoded in grayscale). It will take some digging in the assembly code to get everything right before we can use it. Hopefully we (I) can merge the code with our own but I am not that familiar with SSE code :-), more MMX and x86 in general. If I create something wroking and shareable I will send it to you!
9 From A. Bouchez - 22/03/2010, 20:41
As I said above, iIf I've time these days, I'll try to adapt the x86 SIMD extension for IJG JPEG library version (from MIYASAKA Masaru- see http://cetus.sakura.ne.jp/softlab/j... ) into a pure Delphi unit. This version seems more compatible, but a bit more tricky to adapt as a Delphi unit. If you have some time, just look into it. It should compile with C+ Builder, then not so difficult to interface as a Delphi unit.
10 From B.Kirchhoff - 31/03/2010, 08:10
Hello,
great source.
Is there any possibility to add it in a borland c++ project? If I try so, I've get the message that 'object' is unknown and not supported.
The row with the problem is:
TJpegDecode = object
Thanks in advance, Bernd
11 From A.Bouchez - 31/03/2010, 10:38
"object" is a "record" type with methods.
Just rename object to record, and retry. It may work.
I implemented the object type, instead of class, because the class need to know its own internal type, and the object don't.
One other possibility is to make a normal Delphi class with the PJpegDecode pointer within.
12 From A.Bouchez - 19/05/2010, 08:32
I didn't know this one.
Is there any Delphi translation of this library?
If not, I'll try to do it... it seems very promising.
13 From Sara - 09/06/2010, 16:30
The error occurs when loading large JPEG images.
Width: 8,208 Height: 10,944 24Bit
Is there a workaround?
OS:WindowsXP
14 From A.Bouchez - 12/06/2010, 09:01
Some images are not able to be read by this code. So use it if you are sure of the income of your pictures.
15 From sara - 12/06/2010, 13:04
The same image can be read in Windows7. Apparently, it seems to be resource shortage on the inside.
res:=JpegDecode(Memory,Size,pImg);
if res=JPEG_SUCCESS then
try
Image1.Picture.Bitmap.Assign(pImg^.ToBitmap); << Error
finally
16 From A.Bouchez - 13/06/2010, 13:12
Indeed, it's not an error of our Jpeg library, but a resource shortage in windows, and in the VCL. Contact Microsoft or Embarcadero for fix!!!!
More seriously, such big pictures should be handled directly in memory, not by using Windows bitmap resources. It's a bit difficult, but it works.
If you just want to draw it on screen, don't use TImage or TBitmap, but use direct drawing with the DrawTo method. See also the JpegDraw procedure of the unit.