Dan Byström’s Bwain

Blog without an interesting name

Archive for the ‘Programming’ Category

.Net structs

Posted by Dan Byström on November 28, 2016

Recently, I followed a link in a tweet by Alexandre Mutel (creator of SharpDx, etc)

To my surprise, in that blog post was a link to a StackOverflow question, written by myself, one and a half year ago or so.

That got me remembering why I asked that question – experimenting with serialization and deserialization of struct-arrays – and then the fact that I have several times encountered expert programmers with cargo cult ideas of the behavior of structs in .Net. It is not uncommon to hear statements “structs are always allocated on the stack”, “structs are slower than classes” and “a struct should not be larger than X bytes” (where X varies widely).

No matter what is true for structs – putting them in an array is often a game changer, and sometimes that is the best approach for a given problem. But as always: it depends. Although all this is old and trivial things, I thought that I’d some day write something on this topic. That happened to be today.

Try to compile the following code, one time with Demo declared as a struct and one time as a class:

public [struct or class] Demo
  public double A;
  public double B;

  public static Demo[] AllocateArray(int size)
    var arr = new Demo[size];
    for (var i = 0; i < arr.Length; i++)
      arr[i] = new Demo {A = i + 1, B = i + 2};
    return arr;

  public static void PlayAround(Demo[] arr)
    for (var i = 1; i < arr.Length; i++)
      arr[i - 1].A = arr[i].A + arr[i].B;


Let’s see what happens when we allocate the array (both as x86 and x64) and check the memory consumption, after passing a size of 1,000,000 items:

Memory (Mb)
– x86 26.7
– x64 38.2
– x86 15.3
– x64 15.3

So, what’s going on? Take a look at the picture below.


Each oval represents an object that can be referenced and garbage collected individually. In the first case, with classes, we allocate 1,000,001 individual objects. In the second case, with structs, we just allocate one big object in memory where all the structs resides in consecutive memory.

Each object has an overhead of 8 or 16 bytes (x86 or x64) bytes (this is partially true), and a reference to an object takes 4 or 8 bytes (x86 or x64). So a reference + object overhead is 12 or 24 bytes (x86 or x64). The two doubles A and B takes 16 bytes together in both x86 and x64.

Theoretic calculations then gives (the object overhead of the array itself I omitted since it is just silly in comparison to the other numbers):

Formula Memory (Mb)
– x86 10000000*(12+16) 26.7
– x64 10000000*(24+16) 38.1
– x86 10000000*16 15.3
– x64 10000000*16 15.3

Whoa! Theory and practice correlates! 🙂

Now let’s play around with the allocated array a few iterations (by calling the PlayAround method):

Time (ms)
– x86 258
– x64 313
– x86 117
– x64 128

Now let’s try to make the Demo struct/class significantly bigger by adding a few more doubles to it, bringing it up to a total of 128 bytes, by making the declaration look like this:

  public double A;
  public double B;
  public double pad1;
  public double pad2;
  public double pad3;
  public double pad4;
  public double pad5;
  public double pad6;
  public double pad7;
  public double pad8;
  public double pad9;
  public double pad10;
  public double pad11;
  public double pad12;
  public double pad13;
  public double pad14;
Time (ms)
– x86 1018
– x64 904
– x86 625
– x64 649

As you can see, in this particular case, the rule of the thumb that a struct should not be larger than X bytes, is not applicable (there is an MSDN article claiming that it should not be more than 16 bytes, for example).

As side note when it comes to passing around large structs, you have always the choice of passing them around as ref, even if you do not mean to modify them, although that may confuse a reader of your code. In this case they won’t be copied onto the stack, but passed by reference in a similar way as normal class objects are passed by default.

So, to conclude, when putting structs into an array, you get a behavior that may contradict what is commonly claimed about structs.

Chose the best tool for your current problem and always benchmark instead of guessing.

Thanks for reading, and please remember that fast code consumes less energy.

PS. What I experimented with a while ago was serializing struct arrays directly to disk with no intervening code at all. It turned out that with a local SSD disk drive this method would beat the crap out of any serialized I tested. But when serializing over a network, compressing data like, for example. Protobuf.Net does, gave better performance.

DS. Thanks to Peter af Geijerstam and Per Rovegård for corrections.

Posted in Programming | Leave a Comment »

Problem logging in

Posted by Dan Byström on September 12, 2014

Yesterday I got a new mother board in my computer and after that I couldn’t login to Windows servers at a customer anymore.

It took me quite a while to figure out why. Finally I happened to notice that the system’s clock were off by one month. And that fixed it.

Posted in Programming | Leave a Comment »

VisionQuest, recording date 2014-04-15

Posted by Dan Byström on June 3, 2014

This is a snapshot in time of my personal project at factor10. I may or may not write about it more in the future.

The goal, obviously, is to visualize a source code base in an attractive and appealing way.

So far, I’ve been using the XNA framework, but since that has come to a grinding halt, I’m currently trying to move the project to SharpDX instead.



Posted in Programming | Leave a Comment »


Posted by Dan Byström on December 29, 2013

This post contains a synopsis of an algorithm I used to cut down execution time of the following code:

foreach (var mline in mlines)
  foreach (var xline in xlines)
    if (xline.AIds.Contains(mline.AId) && xline.BIds.Contains(mline.BId) && xline.CIds.Contains(mline.CId))
      // do something useful...

Given that there are 100,000 “xlines” and an average of 4 items in the AIds/BIds/CIds arrays – the execution time drops from 2 hours and 54 minutes down to 47 seconds when matching 1,000,000 “mlines”.

Since I was about to write this down anyway and it contains an unusual data structure – dictionaries of hash sets – I thought that I could just as well publish it.

Given problem: match something called “m-lines”:

# A-Id B-Id C-Id Relevant data…
0 34 56 45
1 12 39 776
2 19 56 92
3 576 243 646

with something called “x-lines”:

# A-Ids B-Ids C-Ids Relevant data… Match with m-lines
0 12,34,59 18,39,56,89 45,78 0
1 12,19,59,117 39,56,243 78,646
2 19,34,35,576 18,56,89 45,92,776 0,2
3 34,35,117,576 56,243 45,78,646,776 0,3

The rule for a match is simple: an m-line matches an x-line if the A-Id of the m-line is present in the A-Ids of the x-line and the same for the B-Id in the B-Idsand the C-Id in the C-Ids.

This is pretty straightforward and as long as we have just a few hundred lines this will be just fine. But as the number of lines increases, this will become painfully slow. On may laptop, 1,000,000 m-lines combined with 100,000 c-lines takes almost 3 hours. And this is supposed to happen in a user interaction!

The solution I came up with is to first pre-processes  the c-lines and produces the following lookups:

A Dictionary B Dictionary C Dictionary
key int value HashSet<int> key int value HashSet<int> key int value HashSet<int>
12 0,1 18 0,1 45 0,2,3
19 1,2 39 0,1 78 0,1,3
34 0,2,3 56 0,1,2,3 92 2
35 2,3 89 0,2 646 1,3
59 0,1 243 1,3 776 2,3
117 1,3
576 2,3

The simple idea is that this gives all the indexes of x-lines containing a certain Id, for the A’s, B’s and C’s respectively. All the x-lines that matches a certain m-line is is then simply the intersection of these three hash sets. And this is a fast operation.

foreach (var mline in mlines)
  HashSet h1, h2, h3;
  if (!alookup.TryGetValue(mline.AId, out h1))
  if (!blookup.TryGetValue(mline.BId, out h2))
  if (!clookup.TryGetValue(mline.CId, out h3))
  putSmallestHashSetFirst(ref h1, ref h2, ref h3);
  foreach (var idx in h1.Where(_ => h2.Contains(_) && h3.Contains(_)))
    var xline = xlines[idx];
    //do something useful...

So there it is: a nice speed up of a factor 222!

It is not only the users waiting time we have saved here – we have also saved a lot of electricity by cutting down the computing time. Fast algorithms should indeed be classified as eco-friendly! 🙂

Therefore, I propose the following New Year’s resolutions for 2014:

  • try to eat less meat
  • try to travel by public transportation whenever possible
  • try to write more efficient algorithms

And then we have helped a little tiny bit to save the environment! Happy New Year!

Posted in Programming | Leave a Comment »

Thumbnails with glass table reflection in GDI+

Posted by Dan Byström on January 12, 2009

vistathumbnailsI’ve been playing around with image processing lately and since my last post about loading thumbnail images from files I couldn’t help myself from trying to roll my own “Web 2.0 reflection effect” directly in .NET 2.0 with no 3D support whatsoever. Actually, I think was more inspired by Windows Vista’s thumbnails (to the right) than the web.

This is what I eventually came up with:


Although this is all easy – since there were a few things that couldn’t be done in “pure” GDI+ and then some uncommon approaches involved in my solution, I think that there may be some people out there who don’t find this totally trivial. So I thought that it might be worth writing this down.

From the original picture I work through four steps:


1. The first step merely shrinks the original picture to the desired size and puts a frame around it. This is trivial:

	protected virtual Bitmap createFramedBitmap( Bitmap bmpSource, Size szFull )
		Bitmap bmp = new Bitmap( szFull.Width, szFull.Height );
		using ( Graphics g = Graphics.FromImage( bmp ) )
			g.FillRectangle( FrameBrush, 0, 0, szFull.Width, szFull.Height );
			g.DrawRectangle( BorderPen, 0, 0, szFull.Width - 1, szFull.Height - 1 );
			g.InterpolationMode = System.Drawing.Drawing2D.InterpolationMode.HighQualityBicubic;
				new Rectangle( FrameWidth, FrameWidth, szFull.Width - FrameWidth * 2, szFull.Height - FrameWidth * 2 ),
				new Rectangle( Point.Empty, bmpSource.Size ),
				GraphicsUnit.Pixel );
		return bmp;

(Note the InterpolationMode property. It is important in order to resize the image with good quality!)

2. The second step is substantially more involved. It takes the result from step 1 and does this, all in one go:

  1. Flip the image upside down (omitting the upper and lower parts of frame, since we don’t want them to be present in the reflection).
  2. Apply a Gaussian blur convolution effect to make the image look…, well, blurred… 🙂
  3. Wash out some color to make the reflection a little bit grayish.
  4. Apply an alpha blend fall out.

Flipping the image and the color wash-out can both be done directly in GDI+. Flip either with Bitmap.RotateFlip or with a transformation matrix and use a ColorMatrix to alter the colors. But since neither a blur effect nor an alpha blend can be done without direct pixel manipulation I did it all in one go. For the blur effect, see Christian Graus excellent article series Image Processing for Dummies with C# and GDI+. I’ve blogged earlier on how to perform alpha blending previously by drawing the blend using a PathGradientBrush or a LinearGradientBrush in Soft edged images in GDI+. This time I will calculate the alpha value instead. The calculation is done once for every scan line and is located in a virtual function so that this formula can be overridden.

All four “effects” are handled in this loop:

	for ( int y = height-1 ; y >= 0 ; y-- )
		byte alpha = (byte)(255 * calculateAlphaFallout( (double)(height - y) / height ));
		Pixel* pS = (Pixel*)bdS.Scan0.ToPointer() + bdS.Width * (bdS.Height - y - FrameWidth - 1);
		Pixel* pD = (Pixel*)bdD.Scan0.ToPointer() + bdD.Width * y;
		for ( int x = bdD.Width ; x > 0 ; x--, pD++, pS++ )
			int R = gaussBlur( &pS->R, nWidthInPixels );
			int G = gaussBlur( &pS->G, nWidthInPixels );
			int B = gaussBlur( &pS->B, nWidthInPixels );
			pD->R = (byte)((R * 3 + G * 2 + B * 2) / 7);
			pD->G = (byte)((R * 2 + G * 3 + B * 2) / 7);
			pD->B = (byte)((R * 2 + G * 2 + B * 3) / 7);
			pD->A = alpha;

Flipping happens on line 4, blurring on lines 8-10 (the gaussBlur function is just a one-liner). Color wash-out is done on lines 11-13 and the alpha fall-out on lines 3 and 14.

The very observant reader will notice that I let the blurring wrap from one edge to another. This is a hack, but it works since the left and right edges are always exactly identical. In production code, it might be a good idea to make the blurring optional (or even to provide a user-defined convolution matrix) and also to do the same from the color wash-out which currently uses hard-coded values.

3. Create the “half sheared” bitmap:

This transform cannot be accomplished using a linear transformation, but (after taking quite a a detour on this) I realized that it can be done embarrassingly simple:

	using ( Graphics g = Graphics.FromImage( Thumbnail ) )
		for ( int x = 0 ; x < sz.Width ; x++ )
				new RectangleF( x, 0, 1, sz.Height - Skew * (float)(sz.Width - x) / sz.Width ),
				new RectangleF( x, 0, 1, sz.Height ),
				GraphicsUnit.Pixel );

I simply use DrawImage to draw each column by its own, transferring one column from the framed image from step 1 to a column of different height. Note that it is extremely important that we pass <strong>float</strong>s and not <strong>int</strong>s - in the latter case the result will be a disaster.

<span style="font-size:x-large;">4.</span> Draw the reflection image through a shear transform, like this:

	using ( Graphics g = Graphics.FromImage( Thumbnail ) )
		System.Drawing.Drawing2D.Matrix m = g.Transform;
		m.Shear( 0, (float)Skew / sz.Width);
		m.Translate( 0, sz.Height - Skew - 1 );
		g.Transform = m;
		g.DrawImage( bmpReflection, Point.Empty );

Download demo source code

A Lesson Learned

At first, I tried to do the shearing in both step 3 & 4 myself using code similar to this (still one column at a time):

	// this was a bad idea
	private static void paintRowWithResize(
		BitmapData bdDst,
		BitmapData bdSrc,
		int nDstColumn,
		int nSrcColumn,
		int nDstRow,
		double dblSrcRow,
		int nRows,
		double dblStep )
				Pixel* pD = (Pixel*)bdDst.Scan0.ToPointer() + nDstColumn + nDstRow * bdDst.Width;
				Pixel* pS = (Pixel*)bdSrc.Scan0.ToPointer() + nSrcColumn;
				while ( nRows -- > 0 )
					int nYSrc = (int)dblSrcRow;
					Pixel p1 = pS[nYSrc * bdSrc.Width];
					Pixel p2 = p2 = pS[(nYSrc + 1) * bdSrc.Width];
					double frac2 = dblSrcRow - nYSrc;
					double frac1 = 1.0 - frac2;
					pD->R = (byte)(p1.R * frac1 + p2.R * frac2);
					pD->G = (byte)(p1.G * frac1 + p2.G * frac2);
					pD->B = (byte)(p1.B * frac1 + p2.B * frac2);
					pD->A = (byte)(p1.A * frac1 + p2.A * frac2);

					dblSrcRow += dblStep;
					pD += bdDst.Width;

The result looked perfectly good, but after thinking about it for awhile I realized that this piece of code actually is completely and utterly wrong in the general case: it only works when the alpha values of the two adjacent pixels are very close. In other cases the result will be poor.

Perhaps the easiest way to see this is to think about what happens when we want a 50% mix of a completely transparent pixel and a completely opaque while pixel. Intuitively I think it’s clear that we want the result to be a white pixel with 50% transparency. However, if we represent the transparent pixel with (0,0,0,0) (the most common value I’d guess, although (0,x,x,x) is transparent regardless of the value of x) we get a gray half transparent pixel instead (127,127,127,127). Not right at all. The reason I thought my attempt looked good in the first place was just because I had a gray border around the images!

So how do we mix pixels with alpha values? Obviously “normal” alpha blending is not sufficient when we have alpha values on both pixels… after thinking about this for a few minutes,  I said to myself “why not ask someone who knows instead?”. That someone is of course Graphics.DrawImage, and so I ended up with much cleaner code. And although I never bothered to figure out how to mix and blend pixels when both pixels contain alpha values I ended up realizing this:

Graphics.DrawImage has quite a bit of work to do when we draw a 32-bit bitmap on top of another. If we have some a-priori knowledge of the nature of the bitmaps we’re working with (are any of them totally opaque?) then it is actually possible to do this ourselves much faster than Graphics.DrawImage has a chance to, because it is forced to work with the general case: both bitmaps may be semi-transparent.

I will get back on how we in some cases can outperform Graphics.DrawImage (when it comes to speed), and hopefully a real life case when we actually bother. Stay tuned. Ekeforshus

Posted in .NET, GDI+, Programming | 12 Comments »

Image.GetThumbnailImage and beyond

Posted by Dan Byström on January 5, 2009

Once I tried to use Image.GetThumbnailImage because I wanted to fire up small thumbnails as fast as possible. So I tried:

	// totally and completely, utterly useless
	private Bitmap getThumbnailImage( string filename, int width, int height )
		using ( Image img = Image.FromFile( filename ) )
			return (Bitmap)img.GetThumbnailImage( width, height, null, IntPtr.Zero );

This turned out to be completely useless, since Image.FromFile first loads the full image so that no performance is gained whatsoever. Trying to google after a solution only resulted in tons of articles saying that Image.GetThumbnailImage is pretty useless and shouldn’t be used. So I dropped it and solved my problem in a completely different way. Now I just stumbled across an overloaded version Image.FromStream which I haven’t noticed before:

Image.FromStream( Stream stream, bool useEmbeddedColorManagement, bool validateImageData )

This opens up for some interesting usages. For example it means that we really can get a thumbnail fast:

	// way, way faster, but still pretty useless
	private Bitmap getThumbnailImage( string filename, int width, int height )
		using ( FileStream fs = new FileStream( filename, FileMode.Open ) )
		using ( Image img = Image.FromStream( fs, true, false ) )
			return (Bitmap)img.GetThumbnailImage( width, height, null, IntPtr.Zero );

This is still pretty useless, since this way we really don’t know how to get a proportional thumbnail. This is something that seems to be lacking in GDI+: an easy way to rescale images proportionally. Quite frankly: how often are we interested in non-proportional rescales? Not that often, I’d say! Here’s a better version:

	// actually works...
	private Bitmap getThumbnailImage( string filename, int width )
		using ( FileStream fs = new FileStream( filename, FileMode.Open ) )
		using ( Image img = Image.FromStream( fs, true, false ) )
			return (Bitmap)img.GetThumbnailImage(
				width * img.Height / img.Width,
				IntPtr.Zero );

But if we arm ourselves with a way to rescale images proportionally, something Microsoft apparently decided to leave as an exercise for each and every programmer who wants to do even the simplest things with images in GDI+:

	public static Size adaptProportionalSize(
		Size szMax,
		Size szReal )
		int nWidth;
		int nHeight;
		double sMaxRatio;
		double sRealRatio;

		if ( szMax.Width < 1 || szMax.Height < 1 || szReal.Width < 1 || szReal.Height < 1 )
			return Size.Empty;

		sMaxRatio = (double)szMax.Width / (double)szMax.Height;
		sRealRatio = (double)szReal.Width / (double)szReal.Height;

		if ( sMaxRatio < sRealRatio )
			nWidth = Math.Min( szMax.Width, szReal.Width );
			nHeight = (int)Math.Round( nWidth / sRealRatio );
			nHeight = Math.Min( szMax.Height, szReal.Height );
			nWidth = (int)Math.Round( nHeight * sRealRatio );

		return new Size( nWidth, nHeight );

With that, we can fire up a thumbnail image fast, with a given maximum allowed size while still proportional:

&#91;sourcecode language='csharp'&#93;
	// even better...
	private Bitmap getThumbnailImage( string filename, Size szMax )
		using ( FileStream fs = new FileStream( filename, FileMode.Open ) )
		using ( Image img = Image.FromStream( fs, true, false ) )
			Size sz = adaptProportionalSize( szMax, img.Size );
			return (Bitmap)img.GetThumbnailImage(
				IntPtr.Zero );

So, it appears that <strong>Image.GetThumbnailImage</strong> had its use after all! But there's more we can do with this.

Even though we managed to load thumbnail images fast and proportionally, the quality isn't particularly good. That's seems to be the main concern among those who advocate not using Image.GetThumbnailImage at all. If a thumbnail is found in the image file it has already been resized once, and resizing a resized image once more certainly won't improve the quality, especially if a crappy resizing algorithm is being used. Let's see what we can do about this. If we're working with JPG images coming from a digital camera, we can most probably find the "real" thumbnail image like this:

	private Bitmap getExifThumbnail( string filename )
		using ( FileStream fs = new FileStream( filename, FileMode.Open ) )
		using ( Image img = Image.FromStream( fs, true, false ) )
			foreach ( PropertyItem pi in img.PropertyItems )
				if ( pi.Id == 20507 )
					return (Bitmap)Image.FromStream( new MemoryStream( pi.Value ) );
		return null;

If we can retrieve a thumbnail this way, it will be in its original size and so we can skip an implicit resize. If we want to rescale it anyway, we can do it in a high quality fashion. I don’t know if this is something everybody knows – I had worked with GDI+ for quite some time before I found it – but fact is that resizing an image like this give good performance but crappy result:

	// poor image quality
	Bitmap bmpResized = new Bitmap( bmpOriginal, newWidth, newHeight );

Instead, try the following:

	// superior image quality
	Bitmap bmpResized = new Bitmap( newWidth, newHeight );
	using ( Graphics g = Graphics.FromImage( bmpResized ) )
		g.InterpolationMode = InterpolationMode.HighQualityBicubic;
			new Rectangle( Point.Empty, bmpResized.Size ),
			new Rectangle( Point.Empty, bmpOriginal.Size ),
			GraphicsUnit.Pixel );

With these code pieces glued together I can now get a thumbnail from a JPG image both faster and with better quality than with my original Image.ImageFromThumbnail attempt!

UPDATE: This technique is used “live” in the demo source code accompanying this post: Thumbnails with glass table reflection in GDI+.

Finally, one more thing that I just come to think of while I was typing this. I have complained in earlier posts that Image.FromFile for some obscure reason keeps the file locked until disposed of. I just realized that there is an easy way around his:

	using ( FileStream fs = new FileStream( filename, FileMode.Open ) )
		bmp = (Bitmap)Image.FromStream( fs );

Behold – now the image is loaded and the file is NOT locked! 🙂 Ekeforshus

Posted in .NET, GDI+, Programming | 17 Comments »

Optimizing away II.3

Posted by Dan Byström on January 1, 2009

Oh, the pain, the pain and the embarrassment…

I just came to realize that although a “long” in C# is 64 bits, in C++ it is still 32 bits. In order to get a 64 bit value in MSVC++ you must type either “long long” or “__int64”. I didn’t know that. 😦

This means that although the assembler function I just presented correctly calculates a 64 bit value, it will be truncated to 32 bits because the surrounding C++ function is declared as a long.

This in turn means that for bitmaps larger than 138 x 138 pixels – the correct result cannot be guaranteed. (With 64 bit values, the bitmap can instead be 9724315 x 9724315 pixels in size before an overflow can occur.)

Unfortunately, although I had unit test to verify the correctness of the function, I only tested with small bitmaps.

I have uploaded a new version. Ekeforshus

Posted in .NET, Programming | 3 Comments »

Optimizing away II.2

Posted by Dan Byström on December 30, 2008

I was asked to upload the source and binary for my last post.

Posted in .NET, Programming | 2 Comments »

Optimizing away II

Posted by Dan Byström on December 22, 2008

Continued from Optimizing away. Ok, now I have worked up the courage.

Prepare yourself for a major disappointment. I really do not know how to tweak that C#-loop to run a nanosecond faster. But I can do the same calculation much faster. How? Just my old favorite party trick. It goes like this:

1. Add a new project to your solution

2. Chose Visual C++ / CLR / Class Library

3. Insert the following managed class:

	public ref class FastImageCompare
		static double compare( void* p1, void* p2, int count )
			return NativeCode::fastImageCompare( p1, p2, count );
		static double compare( IntPtr p1, IntPtr p2, int count )
			return NativeCode::fastImageCompare( p1.ToPointer(), p2.ToPointer(), count );

4. Insert the following function into an unmanaged class (which I happened to call NativeCode):

unsigned long long NativeCode::fastImageCompare( void* p1, void* p2, int count )
	int high32 = 0;

		push	ebx
		push	esi
		push	edi

		mov		esi, p1
		mov		edi, p2
		xor		eax, eax
		dec		count
		js		done

		movzx	ebx, [esi]
		movzx	edx, [edi]
		sub		edx, ebx
		imul	edx, edx

		movzx	ebx, [esi+1]
		movzx	ecx, [edi+1]
		sub		ebx, ecx
		imul	ebx, ebx
		add		edx, ebx

		movzx	ebx, [esi+2]
		movzx	ecx, [edi+2]
		sub		ebx, ecx
		imul	ebx, ebx
		add		edx, ebx

		add		esi, 4
		add		edi, 4

		add		eax, edx
		jnc		again

		inc		high32
		jmp		again
		mov		edx, high32

		pop		edi
		pop		esi
		pop		ebx


Yeah. That’s it. Hand tuned assembly language within a .NET Assembly. UPDATE 2009-01-01: return type of the function changed from “unsigned long” to “unsigned long long”, see here.

I guess that’s almost cheating. And we will be locked inside the Intel platform. Most people won’t mind I guess, but other may have very strong feelings about it. If we really would like to exploit this kind of optimizations while still be portable (to Mono/Mac for example) one possibility would be to load the assembly with native code dynamically. If it fails we could fall back to an alternative version written in pure managed code.

(I know from experience that some people with lesser programming skills react to this with a “what? it must be a crappy compiler if you can write faster code by yourself”. Let me assure you that this is not the case. On the contrary: I’m amazed about the quality of the code emitted by the C# + .NET JIT compilers.)

Posted in .NET, Programming | 13 Comments »

Optimizing away

Posted by Dan Byström on December 16, 2008

Follow-up on Improving performance…  and Genetic Programming: Evolution of Mona Lisa.
I just tested that I can optimize this loop:

        fixed ( Pixel* psourcePixels = sourcePixels )
            Pixel* p1 = (Pixel*)bd.Scan0.ToPointer();
            Pixel* p2 = psourcePixels;
            for ( int i = sourcePixels.Length ; i > 0 ; i--, p1++, p2++ )
                int r = p1->R - p2->R;
                int g = p1->G - p2->G;
                int b = p1->B - p2->B;
                error += r * r + g * g + b * b;

so that it runs even 60% faster. Don’t dare to tell you how, however.

EDIT: Continued here Ekeforshus

Posted in .NET, Programming | 4 Comments »