Thumbnails with glass table reflection in GDI+
Posted by Dan Byström on January 12, 2009
I’ve been playing around with image processing lately and since my last post about loading thumbnail images from files I couldn’t help myself from trying to roll my own “Web 2.0 reflection effect” directly in .NET 2.0 with no 3D support whatsoever. Actually, I think was more inspired by Windows Vista’s thumbnails (to the right) than the web.
This is what I eventually came up with:

Although this is all easy – since there were a few things that couldn’t be done in “pure” GDI+ and then some uncommon approaches involved in my solution, I think that there may be some people out there who don’t find this totally trivial. So I thought that it might be worth writing this down.
From the original picture I work through four steps:

1. The first step merely shrinks the original picture to the desired size and puts a frame around it. This is trivial:
protected virtual Bitmap createFramedBitmap( Bitmap bmpSource, Size szFull )
{
Bitmap bmp = new Bitmap( szFull.Width, szFull.Height );
using ( Graphics g = Graphics.FromImage( bmp ) )
{
g.FillRectangle( FrameBrush, 0, 0, szFull.Width, szFull.Height );
g.DrawRectangle( BorderPen, 0, 0, szFull.Width - 1, szFull.Height - 1 );
g.InterpolationMode = System.Drawing.Drawing2D.InterpolationMode.HighQualityBicubic;
g.DrawImage(
bmpSource,
new Rectangle( FrameWidth, FrameWidth, szFull.Width - FrameWidth * 2, szFull.Height - FrameWidth * 2 ),
new Rectangle( Point.Empty, bmpSource.Size ),
GraphicsUnit.Pixel );
}
return bmp;
}
(Note the InterpolationMode property. It is important in order to resize the image with good quality!)
2. The second step is substantially more involved. It takes the result from step 1 and does this, all in one go:
- Flip the image upside down (omitting the upper and lower parts of frame, since we don’t want them to be present in the reflection).
- Apply a Gaussian blur convolution effect to make the image look…, well, blurred…
- Wash out some color to make the reflection a little bit grayish.
- Apply an alpha blend fall out.
Flipping the image and the color wash-out can both be done directly in GDI+. Flip either with Bitmap.RotateFlip or with a transformation matrix and use a ColorMatrix to alter the colors. But since neither a blur effect nor an alpha blend can be done without direct pixel manipulation I did it all in one go. For the blur effect, see Christian Graus excellent article series Image Processing for Dummies with C# and GDI+. I’ve blogged earlier on how to perform alpha blending previously by drawing the blend using a PathGradientBrush or a LinearGradientBrush in Soft edged images in GDI+. This time I will calculate the alpha value instead. The calculation is done once for every scan line and is located in a virtual function so that this formula can be overridden.
All four “effects” are handled in this loop:
for ( int y = height-1 ; y >= 0 ; y-- )
{
byte alpha = (byte)(255 * calculateAlphaFallout( (double)(height - y) / height ));
Pixel* pS = (Pixel*)bdS.Scan0.ToPointer() + bdS.Width * (bdS.Height - y - FrameWidth - 1);
Pixel* pD = (Pixel*)bdD.Scan0.ToPointer() + bdD.Width * y;
for ( int x = bdD.Width ; x > 0 ; x--, pD++, pS++ )
{
int R = gaussBlur( &pS->R, nWidthInPixels );
int G = gaussBlur( &pS->G, nWidthInPixels );
int B = gaussBlur( &pS->B, nWidthInPixels );
pD->R = (byte)((R * 3 + G * 2 + B * 2) / 7);
pD->G = (byte)((R * 2 + G * 3 + B * 2) / 7);
pD->B = (byte)((R * 2 + G * 2 + B * 3) / 7);
pD->A = alpha;
}
}
Flipping happens on line 4, blurring on lines 8-10 (the gaussBlur function is just a one-liner). Color wash-out is done on lines 11-13 and the alpha fall-out on lines 3 and 14.
The very observant reader will notice that I let the blurring wrap from one edge to another. This is a hack, but it works since the left and right edges are always exactly identical. In production code, it might be a good idea to make the blurring optional (or even to provide a user-defined convolution matrix) and also to do the same from the color wash-out which currently uses hard-coded values.
3. Create the “half sheared” bitmap:
This transform cannot be accomplished using a linear transformation, but (after taking quite a a detour on this) I realized that it can be done embarrassingly simple:
using ( Graphics g = Graphics.FromImage( Thumbnail ) ) for ( int x = 0 ; x < sz.Width ; x++ ) g.DrawImage( bmpFramed, new RectangleF( x, 0, 1, sz.Height - Skew * (float)(sz.Width - x) / sz.Width ), new RectangleF( x, 0, 1, sz.Height ), GraphicsUnit.Pixel );
I simply use DrawImage to draw each column by its own, transferring one column from the framed image from step 1 to a column of different height. Note that it is extremely important that we pass floats and not ints – in the latter case the result will be a disaster.
4. Draw the reflection image through a shear transform, like this:
using ( Graphics g = Graphics.FromImage( Thumbnail ) )
{
System.Drawing.Drawing2D.Matrix m = g.Transform;
m.Shear( 0, (float)Skew / sz.Width);
m.Translate( 0, sz.Height - Skew - 1 );
g.Transform = m;
g.DrawImage( bmpReflection, Point.Empty );
}
A Lesson Learned
At first, I tried to do the shearing in both step 3 & 4 myself using code similar to this (still one column at a time):
// this was a bad idea
private static void paintRowWithResize(
BitmapData bdDst,
BitmapData bdSrc,
int nDstColumn,
int nSrcColumn,
int nDstRow,
double dblSrcRow,
int nRows,
double dblStep )
{
unchecked
{
unsafe
{
Pixel* pD = (Pixel*)bdDst.Scan0.ToPointer() + nDstColumn + nDstRow * bdDst.Width;
Pixel* pS = (Pixel*)bdSrc.Scan0.ToPointer() + nSrcColumn;
while ( nRows -- > 0 )
{
int nYSrc = (int)dblSrcRow;
Pixel p1 = pS[nYSrc * bdSrc.Width];
Pixel p2 = p2 = pS[(nYSrc + 1) * bdSrc.Width];
double frac2 = dblSrcRow - nYSrc;
double frac1 = 1.0 - frac2;
pD->R = (byte)(p1.R * frac1 + p2.R * frac2);
pD->G = (byte)(p1.G * frac1 + p2.G * frac2);
pD->B = (byte)(p1.B * frac1 + p2.B * frac2);
pD->A = (byte)(p1.A * frac1 + p2.A * frac2);
dblSrcRow += dblStep;
pD += bdDst.Width;
}
}
}
}
The result looked perfectly good, but after thinking about it for awhile I realized that this piece of code actually is completely and utterly wrong in the general case: it only works when the alpha values of the two adjacent pixels are very close. In other cases the result will be poor.
Perhaps the easiest way to see this is to think about what happens when we want a 50% mix of a completely transparent pixel and a completely opaque while pixel. Intuitively I think it’s clear that we want the result to be a white pixel with 50% transparency. However, if we represent the transparent pixel with (0,0,0,0) (the most common value I’d guess, although (0,x,x,x) is transparent regardless of the value of x) we get a gray half transparent pixel instead (127,127,127,127). Not right at all. The reason I thought my attempt looked good in the first place was just because I had a gray border around the images!
So how do we mix pixels with alpha values? Obviously “normal” alpha blending is not sufficient when we have alpha values on both pixels… after thinking about this for a few minutes, I said to myself “why not ask someone who knows instead?”. That someone is of course Graphics.DrawImage, and so I ended up with much cleaner code. And although I never bothered to figure out how to mix and blend pixels when both pixels contain alpha values I ended up realizing this:
Graphics.DrawImage has quite a bit of work to do when we draw a 32-bit bitmap on top of another. If we have some a-priori knowledge of the nature of the bitmaps we’re working with (are any of them totally opaque?) then it is actually possible to do this ourselves much faster than Graphics.DrawImage has a chance to, because it is forced to work with the general case: both bitmaps may be semi-transparent.
I will get back on how we in some cases can outperform Graphics.DrawImage (when it comes to speed), and hopefully a real life case when we actually bother. Stay tuned. Ekeforshus
Posted in .NET, GDI+, Programming | 11 Comments »
Jeff Minter is back
Posted by Dan Byström on January 12, 2009
The man who once brought us Metagalactic Llamas Battle at the Edge of Time, Sheep in Space and Revenge of the Mutant Camals has returned with:
That feels good to know. Ekeforshus
Posted in Uncategorized | Leave a Comment »
Image.GetThumbnailImage and beyond
Posted by Dan Byström on January 5, 2009
Once I tried to use Image.GetThumbnailImage because I wanted to fire up small thumbnails as fast as possible. So I tried:
// totally and completely, utterly useless
private Bitmap getThumbnailImage( string filename, int width, int height )
{
using ( Image img = Image.FromFile( filename ) )
return (Bitmap)img.GetThumbnailImage( width, height, null, IntPtr.Zero );
}
This turned out to be completely useless, since Image.FromFile first loads the full image so that no performance is gained whatsoever. Trying to google after a solution only resulted in tons of articles saying that Image.GetThumbnailImage is pretty useless and shouldn’t be used. So I dropped it and solved my problem in a completely different way. Now I just stumbled across an overloaded version Image.FromStream which I haven’t noticed before:
Image.FromStream( Stream stream, bool useEmbeddedColorManagement, bool validateImageData )
This opens up for some interesting usages. For example it means that we really can get a thumbnail fast:
// way, way faster, but still pretty useless
private Bitmap getThumbnailImage( string filename, int width, int height )
{
using ( FileStream fs = new FileStream( filename, FileMode.Open ) )
using ( Image img = Image.FromStream( fs, true, false ) )
return (Bitmap)img.GetThumbnailImage( width, height, null, IntPtr.Zero );
}
This is still pretty useless, since this way we really don’t know how to get a proportional thumbnail. This is something that seems to be lacking in GDI+: an easy way to rescale images proportionally. Quite frankly: how often are we interested in non-proportional rescales? Not that often, I’d say! Here’s a better version:
// actually works...
private Bitmap getThumbnailImage( string filename, int width )
{
using ( FileStream fs = new FileStream( filename, FileMode.Open ) )
using ( Image img = Image.FromStream( fs, true, false ) )
return (Bitmap)img.GetThumbnailImage(
width,
width * img.Height / img.Width,
null,
IntPtr.Zero );
}
But if we arm ourselves with a way to rescale images proportionally, something Microsoft apparently decided to leave as an exercise for each and every programmer who wants to do even the simplest things with images in GDI+:
public static Size adaptProportionalSize(
Size szMax,
Size szReal )
{
int nWidth;
int nHeight;
double sMaxRatio;
double sRealRatio;
if ( szMax.Width < 1 || szMax.Height < 1 || szReal.Width < 1 || szReal.Height < 1 )
return Size.Empty;
sMaxRatio = (double)szMax.Width / (double)szMax.Height;
sRealRatio = (double)szReal.Width / (double)szReal.Height;
if ( sMaxRatio < sRealRatio )
{
nWidth = Math.Min( szMax.Width, szReal.Width );
nHeight = (int)Math.Round( nWidth / sRealRatio );
}
else
{
nHeight = Math.Min( szMax.Height, szReal.Height );
nWidth = (int)Math.Round( nHeight * sRealRatio );
}
return new Size( nWidth, nHeight );
}
With that, we can fire up a thumbnail image fast, with a given maximum allowed size while still proportional:
// even better...
private Bitmap getThumbnailImage( string filename, Size szMax )
{
using ( FileStream fs = new FileStream( filename, FileMode.Open ) )
using ( Image img = Image.FromStream( fs, true, false ) )
{
Size sz = adaptProportionalSize( szMax, img.Size );
return (Bitmap)img.GetThumbnailImage(
sz.Width,
sz.Height,
null,
IntPtr.Zero );
}
}
So, it appears that Image.GetThumbnailImage had its use after all! But there’s more we can do with this.
Even though we managed to load thumbnail images fast and proportionally, the quality isn’t particularly good. That’s seems to be the main concern among those who advocate not using Image.GetThumbnailImage at all. If a thumbnail is found in the image file it has already been resized once, and resizing a resized image once more certainly won’t improve the quality, especially if a crappy resizing algorithm is being used. Let’s see what we can do about this. If we’re working with JPG images coming from a digital camera, we can most probably find the “real” thumbnail image like this:
private Bitmap getExifThumbnail( string filename )
{
using ( FileStream fs = new FileStream( filename, FileMode.Open ) )
using ( Image img = Image.FromStream( fs, true, false ) )
{
foreach ( PropertyItem pi in img.PropertyItems )
if ( pi.Id == 20507 )
return (Bitmap)Image.FromStream( new MemoryStream( pi.Value ) );
}
return null;
}
If we can retrieve a thumbnail this way, it will be in its original size and so we can skip an implicit resize. If we want to rescale it anyway, we can do it in a high quality fashion. I don’t know if this is something everybody knows – I had worked with GDI+ for quite some time before I found it – but fact is that resizing an image like this give good performance but crappy result:
// poor image quality Bitmap bmpResized = new Bitmap( bmpOriginal, newWidth, newHeight );
Instead, try the following:
// superior image quality
Bitmap bmpResized = new Bitmap( newWidth, newHeight );
using ( Graphics g = Graphics.FromImage( bmpResized ) )
{
g.InterpolationMode = InterpolationMode.HighQualityBicubic;
g.DrawImage(
bmpOriginal,
new Rectangle( Point.Empty, bmpResized.Size ),
new Rectangle( Point.Empty, bmpOriginal.Size ),
GraphicsUnit.Pixel );
}
With these code pieces glued together I can now get a thumbnail from a JPG image both faster and with better quality than with my original Image.ImageFromThumbnail attempt!
UPDATE: This technique is used “live” in the demo source code accompanying this post: Thumbnails with glass table reflection in GDI+.
Finally, one more thing that I just come to think of while I was typing this. I have complained in earlier posts that Image.FromFile for some obscure reason keeps the file locked until disposed of. I just realized that there is an easy way around his:
using ( FileStream fs = new FileStream( filename, FileMode.Open ) ) bmp = (Bitmap)Image.FromStream( fs );
Behold – now the image is loaded and the file is NOT locked!
Ekeforshus
Posted in .NET, GDI+, Programming | 11 Comments »
Optimizing away II.3
Posted by Dan Byström on January 1, 2009
Oh, the pain, the pain and the embarrassment…
I just came to realize that although a “long” in C# is 64 bits, in C++ it is still 32 bits. In order to get a 64 bit value in MSVC++ you must type either “long long” or “__int64″. I didn’t know that.
This means that although the assembler function I just presented correctly calculates a 64 bit value, it will be truncated to 32 bits because the surrounding C++ function is declared as a long.
This in turn means that for bitmaps larger than 138 x 138 pixels – the correct result cannot be guaranteed. (With 64 bit values, the bitmap can instead be 9724315 x 9724315 pixels in size before an overflow can occur.)
Unfortunately, although I had unit test to verify the correctness of the function, I only tested with small bitmaps.
I have uploaded a new version. Ekeforshus
Posted in .NET, Programming | 3 Comments »
Optimizing away II.2
Posted by Dan Byström on December 30, 2008
I was asked to upload the source and binary for my last post.
Posted in .NET, Programming | 2 Comments »
Optimizing away II
Posted by Dan Byström on December 22, 2008
Continued from Optimizing away. Ok, now I have worked up the courage.
Prepare yourself for a major disappointment. I really do not know how to tweak that C#-loop to run a nanosecond faster. But I can do the same calculation much faster. How? Just my old favorite party trick. It goes like this:
1. Add a new project to your solution
2. Chose Visual C++ / CLR / Class Library
3. Insert the following managed class:
public ref class FastImageCompare
{
public:
static double compare( void* p1, void* p2, int count )
{
return NativeCode::fastImageCompare( p1, p2, count );
}
static double compare( IntPtr p1, IntPtr p2, int count )
{
return NativeCode::fastImageCompare( p1.ToPointer(), p2.ToPointer(), count );
}
};
4. Insert the following function into an unmanaged class (which I happened to call NativeCode):
unsigned long long NativeCode::fastImageCompare( void* p1, void* p2, int count )
{
int high32 = 0;
_asm
{
push ebx
push esi
push edi
mov esi, p1
mov edi, p2
xor eax, eax
again:
dec count
js done
movzx ebx, [esi]
movzx edx, [edi]
sub edx, ebx
imul edx, edx
movzx ebx, [esi+1]
movzx ecx, [edi+1]
sub ebx, ecx
imul ebx, ebx
add edx, ebx
movzx ebx, [esi+2]
movzx ecx, [edi+2]
sub ebx, ecx
imul ebx, ebx
add edx, ebx
add esi, 4
add edi, 4
add eax, edx
jnc again
inc high32
jmp again
done:
mov edx, high32
pop edi
pop esi
pop ebx
}
}
Yeah. That’s it. Hand tuned assembly language within a .NET Assembly. UPDATE 2009-01-01: return type of the function changed from “unsigned long” to “unsigned long long”, see here.
I guess that’s almost cheating. And we will be locked inside the Intel platform. Most people won’t mind I guess, but other may have very strong feelings about it. If we really would like to exploit this kind of optimizations while still be portable (to Mono/Mac for example) one possibility would be to load the assembly with native code dynamically. If it fails we could fall back to an alternative version written in pure managed code.
(I know from experience that some people with lesser programming skills react to this with a “what? it must be a crappy compiler if you can write faster code by yourself”. Let me assure you that this is not the case. On the contrary: I’m amazed about the quality of the code emitted by the C# + .NET JIT compilers.)
Posted in .NET, Programming | 13 Comments »
Optimizing away
Posted by Dan Byström on December 16, 2008
Follow-up on Improving performance… and Genetic Programming: Evolution of Mona Lisa.
I just tested that I can optimize this loop:
unchecked
{
unsafe
{
fixed ( Pixel* psourcePixels = sourcePixels )
{
Pixel* p1 = (Pixel*)bd.Scan0.ToPointer();
Pixel* p2 = psourcePixels;
for ( int i = sourcePixels.Length ; i > 0 ; i--, p1++, p2++ )
{
int r = p1->R - p2->R;
int g = p1->G - p2->G;
int b = p1->B - p2->B;
error += r * r + g * g + b * b;
}
}
}
}
so that it runs even 60% faster. Don’t dare to tell you how, however.
EDIT: Continued here Ekeforshus
Posted in .NET, Programming | 4 Comments »
Besserwisser post on EvoLisa
Posted by Dan Byström on December 14, 2008
As you are all probably already familiar with, Roger Alsing got this really really cool idea earlier this week.
In short: would it be possible to construct a vector version of some image by overlaying just a few semi-transparent polygons? And if so, how do you figure out how these polygons would look like?
I can’t figure out how he got this idea. If someone had asked me if it could be done I’m afraid I’d just answered “Heck, no. No way. No use even trying.”. Therefore; a brilliant idea I must say!
Some people saw it as a proof of evolution. Some people saw it as proof of creationism. Some people saw it as an image compression algorithm. Some people saw it as a cool but useless toy. I guess some other people didn’t know what to think.
I guess I saw it as a powerful demonstration of a technique to use when you really have no idea how to attack a really complicated problem. I wounder if someone know how to construct an algorithm that directly converges towards the desired image without using randomness? The word “tricky” really sounds like an understatement!
Out of curiosity, I looked at the fitness function. That is, the piece of code that tries to compare how similar/different two images are. Then I noticed that since Roger apparently had been under much pressure to reveal his quick and dirty hack to the public, he hadn’t had the time to optimize that code. So I couldn’t resist doing just that.
And behold: the performance increase was a whopping 25 times! Yeah,a 25 time speed increase is like cruising in 50 km/h compared to breaking the sound barrier. Pretty cool, eh?
Posted in Programming | 1 Comment »
Improving performance…
Posted by Dan Byström on December 14, 2008
…of the fitness function in the EvoLisa project. In case you managed to miss it, EvoLisa is an already world famous project created by Roger Alsing earlier this week.
Continued from this post.
With just a few changes, we can actually make the original fitness function run 25 times faster. I’ll start by presenting the original code and then directly the improved version. After that I’ll discuss my reasoning behind each one of the changes.
Original code:
public static class FitnessCalculator
{
public static double GetDrawingFitness( DnaDrawing newDrawing, Color[,] sourceColors )
{
double error = 0;
using ( var b = new Bitmap( Tools.MaxWidth, Tools.MaxHeight, PixelFormat.Format24bppRgb ) )
using ( Graphics g = Graphics.FromImage( b ) )
{
Renderer.Render(newDrawing, g, 1);
BitmapData bmd1 = b.LockBits(
new Rectangle( 0, 0, Tools.MaxWidth, Tools.MaxHeight ),
ImageLockMode.ReadOnly,
PixelFormat.Format24bppRgb );
for ( int y = 0 ; y < Tools.MaxHeight ; y++ )
{
for ( int x = 0 ; x < Tools.MaxWidth ; x++ )
{
Color c1 = GetPixel( bmd1, x, y );
Color c2 = sourceColors[x, y];
double pixelError = GetColorFitness( c1, c2 );
error += pixelError;
}
}
b.UnlockBits( bmd1 );
}
return error;
}
private static unsafe Color GetPixel( BitmapData bmd, int x, int y )
{
byte* p = (byte*)bmd.Scan0 + y * bmd.Stride + 3 * x;
return Color.FromArgb( p[2], p[1], p[0] );
}
private static double GetColorFitness( Color c1, Color c2 )
{
double r = c1.R - c2.R;
double g = c1.G - c2.G;
double b = c1.B - c2.B;
return r * r + g * g + b * b;
}
}
Optimized code, 25 times as fast:
public struct Pixel
{
public byte B;
public byte G;
public byte R;
public byte A;
}
public class NewFitnessCalculator : IDisposable
{
private Bitmap _bmp;
private Graphics _g;
public NewFitnessCalculator()
{
_bmp = new Bitmap( Tools.MaxWidth, Tools.MaxHeight );
_g = Graphics.FromImage( _bmp );
}
public void Dispose()
{
_g.Dispose();
_bmp.Dispose();
}
public double GetDrawingFitness( DnaDrawing newDrawing, Pixel[] sourcePixels )
{
double error = 0;
Renderer.Render(newDrawing, g, 1);
BitmapData bd = _bmp.LockBits(
new Rectangle( 0, 0, Tools.MaxWidth, Tools.MaxHeight ),
ImageLockMode.ReadOnly,
PixelFormat.Format32bppArgb );
unchecked
{
unsafe
{
fixed ( Pixel* psourcePixels = sourcePixels )
{
Pixel* p1 = (Pixel*)bd.Scan0.ToPointer();
Pixel* p2 = psourcePixels;
for ( int i = sourcePixels.Length ; i > 0 ; i--, p1++, p2++ )
{
int r = p1->R - p2->R;
int g = p1->G - p2->G;
int b = p1->B - p2->B;
error += r * r + g * g + b * b;
}
}
}
}
_bmp.UnlockBits( bd );
return error;
}
}
First of all we notice that each time the fitness function is called, a new bitmap is constructed, used and then destroyed. This is fine for a function that seldom gets called. But for a function that is repeatedly called, we’ll be far better off if we reuse the same Bitmap and Graphics objects over and over.
Therefore I have changed the class from being static into one that muct be instantiated. Of course, that requires some minor changes to the consumer of this class, but in my opinion this will only be for the better. Although convenient, static methods (and/or singletons) are very hostile to unit testing and mocking, so I’m trying to move away from them anyway.
To my surprise, this first optimization attempt only buys us a few percent performance increase. I’m somewhat surprised at this, but anyway, it’s a start, and now it will get better. Read on.
So, once we’ve added a constructor to create the bitmap and graphics objects once and for all (as we’ll as making the class disposable so that the two GDI+ objects can be disposed) we move on to the real performance issues:
for ( int y = 0 ; y < Tools.MaxHeight ; y++ )
{
for ( int x = 0 ; x < Tools.MaxWidth ; x++ )
{
Color c1 = GetPixel( bmd1, x, y );
Color c2 = sourceColors[x, y];
double pixelError = GetColorFitness( c1, c2 );
error += pixelError;
}
}
This code looks pretty innocent, eh? It is not.
Even for a moderately sized bitmap, say 1,000 by 1,000 pixels, the code in the inner loop is executed 1,000,000 times. Thats a pretty big number. This means that each tiny little “error”, performance-wise, is multiplied by 1,000,000 so every little tiny tiny thing will count in the end.
So for example, just each method call will consume time compared to having the method’s code inline within the loop. Above we find two method calls GetPixel and GetColorFitness which will be far better off moved inside the loop, but as I will end up explaining is that the worst performance hog here is really the innocent looking line “Color c2 = sourceColors[x, y];”. Anyway, off we go:
unchecked
{
unsafe
{
for ( int y = 0 ; y < Tools.MaxHeight ; y++ )
{
for ( int x = 0 ; x < Tools.MaxWidth ; x++ )
{
byte* p = (byte*)bmd1.Scan0 + y * bmd1.Stride + 3 * x;
Color c1 = Color.FromArgb( p[2], p[1], p[0] );
Color c2 = sourceColors[x, y];
int R = c1.R - c2.R;
int G = c1.G - c2.G;
int B = c1.B - c2.B;
error += R * R + G * G + B * B;
}
}
}
}
The above changes, including changing the variables R, G & B from double into int will buy us approximately a 30% speed increase. Ain’t much compared to 25 times but still we’re moving on. Then we can look at the “Color c1″ and notice that we can get rid of it completely by simply changing the inner code like so:
byte* p = (byte*)bmd1.Scan0 + y * bmd1.Stride + 3 * x; Color c2 = sourceColors[x, y]; int R = p[2] - c2.R; int G = p[1] - c2.G; int B = p[0] - c2.B; error += R * R + G * G + B * B;
Now we actually have code that executes TWICE as fast as our original code. And now we must turn our attention to the first two. The rest I don’t think we can do much about.
Think about it. What we want to to is loop over each and every pixel in the image. Why then do we need to calculate the memory address for each pixel when we want to move to the next pixel? For each pixel we do completely unnecessary calculations. First “(byte*)bmd1.Scan0 + y * bmd1.Stride + 3 * x”; this contains four variables, two additions and two multiplications when really a single increment is all we need.
Then “sourceColors[x, y]“. Fast enough and nothing we can improve here, right? No, no no, this is far WORSE! It looks completely harmless, but not only is a similar formula as the previous one taking place behind the scenes; for each pixel, the x and y parameters are being bounds checked, ensuring that we do not pass illegal values to the array-lookup!!!
So this innocent-looking expression will cause someting like this to happen somewhere around a million times for each fitness calculation:
// pseudo-code if ( x < sourceColors.GetLowerBound( 0 ) || y < sourceColors.GetLowerBound( 1 ) || x > sourceColors.GetUpperBound( 0 ) || y > sourceColors.GetUpperBound( 1 ) ) throw new IndexOutOfRangeException( "(Index was outside the bounds of the array." ); Color c2 = *( &sourceColors + x * ( sourceColors.GetUpperBound( 1 ) + 1 ) + y );
Now we’re in for a little heavier refactoring. Unfortunately the sourcePixel matrix is laid out column-by-row instead of row-by-column which would have been better, so in order to solve this issue I’ll even change it into a vector of type “Pixel” instead. This requires change to the method signature and to the construction of the the matrix/vector itself of course, but once in place:
unchecked
{
unsafe
{
fixed ( Pixel* psourceColors = sourceColors )
{
Pixel* pc = psourceColors;
for ( int y = 0 ; y < Tools.MaxHeight ; y++ )
{
byte* p = (byte*)bmd1.Scan0 + y * bmd1.Stride;
for ( int x = 0 ; x < Tools.MaxWidth ; x++, p += 3, pc++ )
{
int R = p[2] - pc->R;
int G = p[1] - pc->G;
int B = p[0] - pc->B;
error += R * R + G * G + B * B;
}
}
}
}
}
we´re actually in for a performance improvement of 15 times!!!
Yeah, that’s actually how bad (performance-wise) the innocent looking line “Color c2 = sourceColors[x, y];” was. Bet some of you didn’t know that!!!
In order to change sourceColors from a matrix of Color into a vector of Pixel (declared as in the second code window above) I did this:
public static Pixel[] SetupSourceColorMatrix( Bitmap sourceImage )
{
if ( sourceImage == null )
throw new NotSupportedException( "A source image of Bitmap format must be provided" );
BitmapData bd = sourceImage.LockBits(
new Rectangle( 0, 0, Tools.MaxWidth, Tools.MaxHeight ),
ImageLockMode.ReadOnly,
PixelFormat.Format32bppArgb );
Pixel[] sourcePixels = new Pixel[Tools.MaxWidth * Tools.MaxHeight];
unsafe
{
fixed ( Pixel* psourcePixels = sourcePixels )
{
Pixel* pSrc = (Pixel*)bd.Scan0.ToPointer();
Pixel* pDst = psourcePixels;
for ( int i = sourcePixels.Length ; i > 0 ; i-- )
*( pDst++ ) = *( pSrc++ );
}
}
sourceImage.UnlockBits( bd );
return sourcePixels;
}
Probably a little overkill… but what the heck… Now I guess many people who are familiar with LockBits and direct pixel manipulation will cry out HEY YOU CAN’T DO THAT! YOU MUST TAKE THE “STRIDE” INTO ACCOUNT WHEN YOU MOVE TO A NEW SCAN LINE.
Well, yes… and no. Not when I use the PixelFormat.Format32bppArgb! Go figure!
So our new changes means that we process each ROW in the bitmap blindingly fast, as compared to the original version and combined with our caching of the bitmap we have gained a performance boost of 20 times!
Now for my final version I have rendered the drawing in PixelFormat.Format32bppArgb, which is the default format for bitmaps in GDI+. In that format each pixel will be exactly four bytes in size, which in turn means that GDI+ places no “gap” between the last pixel in one row and the first pixel in the next row and so we are actually able to treat the whole image as a single vector, processing it all in one go.
To conclude: in c# we can use unsafe code and pointer arithmetic to access an array far faster than the normal indexer, because we short-circuit the bounds checking. If we iterate over several consecutive elements in the array our gain is even larger, because we just increment the pointer instead of recalculating the memory address of the cells of the array over and over.
Normally we don’t want to bother with this because the gain of safe and managed code is bigger than the performance gain. But when it comes to image processing this perspective may not be as clear.
BTW, I wrote a post on a similar topic a few months ago: Soft Edged Images in GDI+.
EDIT: Continued here Ekeforshus
Posted in .NET, GDI+, Programming | 15 Comments »

