Dan Byström’s Bwain

Blog without an interesting name

Archive for the ‘Programming’ Category

Real Programmers don’t use Pascal – 25 year annivarsary

Posted by Dan Byström on July 31, 2008

I just happened to notice that the legendary text “Real Programmers Don’t Use Pascal” was published exactly 25 years ago this month (and I just have a couple of hours left before this month ends, so I thouht I had to write up something quickly). You can often find it on the Net filed under “humor”. This can, of course, only be done by Quiche Eaters! Sometimes someone has replaced Pascal with Visual Basic. I have no problem with that. 🙂

Among all the words of wisdom, I found these:

  • Real Programmers aren’t afraid to use GOTOs.
  • Real Programmers can write five page long DO loops without getting confused.
  • Real Programmers like Arithmetic IF statements– they make the code more interesting.
  • Real Programmers write self-modifying code, especially if they can save 20 nanoseconds in the middle of a tight loop.
  • Real Programmers don’t need comments– the code is obvious.
  • Since Fortran doesn’t have a structured IF, REPEAT … UNTIL, or CASE statement, Real Programmers don’t have to worry about not using them. Besides, they can be simulated when necessary using assigned GOTOs.

I guess that the number of programmers out there who have written (or even know what it means to write) arithmetic IF-statements or self-modifying code are decreasing rapidly by the day. And writing a five page long DO loops without getting confused may be a cool party trick, but nothing to applause in real work. But the remaining two points are something I’d like to comment!

Real Programmers aren’t afraid to use GOTOs

Well, why on earth should they be? Really? This “truth” seems to be something that all student know by heart. Thou shalt not use GOTO. Thou shalt not use GOTO. Without even understanding why GOTO was such a big problem in the first place. In a structured language it is not. How could it be that more students seem to know about this silly “non-problem” while at the same time failing to understand the most fundamental thing about programming: Thou shalt never duplicate code. Thou shalt not use copy-and-paste. I just wonder…

Real Programmers don’t need comments– the code is obvious

I think this is almost true! It only needs slight modification:

Real Programmes write code in such a way that the code becomes obvious

I have come to realise that when I download a piece of code from the Net, the first thing I usually have to do is delete all the comments so that I can read the code!!! Most of the time, comments seem to have been put there just “because you must write comments”. Then after a time, changes are made to the code and the comments are not updated, leaving any reader totally confused! Skip the comments and write clean code with long descriptive methods and variable names I say!

(This goes for much of the documentation too, I think. The code is the best documentations because it is “the truth”! This is one of the reasons Tests (Test Driven Development) is such a great thing: you get “true documentation” for free.)

Posted in Nostalgia, Programming | Leave a Comment »

Quote of the day

Posted by Dan Byström on July 18, 2008

“He doesn’t seem to have all his methods compiled”.
— Douglas Nilsson Ekeforshus

Posted in Programming | Leave a Comment »

VB.NET as a DSL

Posted by Dan Byström on July 4, 2008

The other day I got a glimpse of something that could turn out to be really useful. Or maybe a Pandora’s Box. I really don’t know until I investigated it more. 🙂

I was asked to add some simple string manipulation functions (to something that we for simplicity can think of as a report generator).

After thinking about the problem I decided that I would violate the YAGNI (you ain’t gonna need it) principle and give the customer the full Monty directly. Why? Because I realized that it was so incredibly easy to give them the possibility to write full fledged VB.NET code directly that it would probably take the same amount of time to code as some basic string manipulation functions together with a UI!

Consider the following facts:

  1. It is extremely easy (and fast) to compile a piece of VB.NET (or C#) code into an in-memory assembly and execute that code. It takes roughly 20 lines of code.
  2. This dynamically compiled code can easily be given access to any specified part of the application or it’s business objects (just by adding the appropriate assemblies).
  3. Of course the code to be compiled can be preprocessed first, in order to provide the user with a syntax as simple as possible.

I think that these facts taken together implies that opening up parts of the business logic code to the user / customer / domain expert could be worthwhile.

So, the next time I feel that:

  1. I am just transforming the domain expert’s words into executable code
  2. I may not have fully understood the intention of the domain expert
  3. the code I’m writing is likely to change in the future

Then I will definitely consider making that code modifiable by both the domain experts and me (or most likely: read by the domain experts and modified by me and them together). The choice of VB.NET over C# seems obvious; novices read Basic easier – it is no coincidence that “B” in BASIC stands for Beginner. Many users have also encountered VBA while playing around with macros in Word and Excel. Ekeforshus

Posted in .NET, Programming | Leave a Comment »

Insomnia

Posted by Dan Byström on March 6, 2008

To iterate is human, to recurse divine.

L. Peter Deutsch

To iterate is boring, to recurse plesantly dizzying.

Dan Byström

Have you ever tried counting sheep when trying to sleep? For twenty years I’ve been utilizing a different method. I’ve been trying to solve the following problem:

How do you build a balanced binary search tree from a stream of sorted data of unknown size?

That is, given a stream of the letters: ABCDEFGHIJKLMNO, construct the following tree:

H
/ \
D L
/ \ / \
B F J N

/

\

/

\

/

\

/

\
A C E G I K M O

directly by reading the characters one by one and place them DIRECTLY in the correct position WITHOUT using any intermediate storage.

I’m confident that this has been thoroughly researched at least half a century ago, but I have never wanted to look up any solution – what’s the fun in solving a crossword puzzle if you look at the solution first?

This sleeping strategy has been very successful, up until last week, when baby Ellen gave up a horrible cry in the middle of the night – and while trying to go to sleep again I remembered this old problem. Accidentally I solved it and now it’s useless as a sleeping pill!!! But since I have a history with this problem for twenty years now, I thought I had to write a little memorial over it before letting go.

This will also be a tribute to the most powerful solving method I know of: Divide and Conquer!

Background

Two decades ago, when I was a student (that was at a time when I could also be called both young and promising) taking a programming course I chose to write a record register application using Turbo Pascal 3.0. This was a time when CD players were just as common a Bly-ray players today (that is: most people were still waiting for the prices to drop). Vinyl was the name of the game. (Turbo Pascal 3.0, by the way, was an incredible piece of work; the whole environment consisted of one file with the whopping size of 38kb containing the full runtime system, the compiler and the source code editor! You read right: 38kb – not 38Mb!)

I wanted this application to be able to run smoothly with its data files residing on a floppy disc (and of course, memory consumption had to be low). I chose to use binary search trees to keep track of the albums, artists and song titles. This worked well and it was quite fun figuring out all the things that were not in the books, like: given an arbitrary node in a tree, how do you implement a “next” function that will bring you to the next node in sort order when the user hits the “next” button? Not completely obvious the first time you try to acquaintance yourself with the concept of trees. Adn then embarrasingly simples once you figured it out.

However, there was this one thing that I never bothered to figure out, but took an easy way out. That has nagged me ever since.

Prequisities

Let’s declare a simple binary tree node:

public class Node<T> where T : IComparable<T>
{
  public readonly T Data;
  public Node<T> Left;
  public Node<T> Right;
  public Node( T data )
  {
    Data = data;
  }
}

Then let’s define a data pump that will read data/records/entities (or whatever we like to call them) from some kind of stream. The important thing here is that there is no look-ahead facility:

interface IStreamingDataPump<T> where T : IComparable<T>
{
  T GetNextData();
  boolEOF { get; }
}

Maybe it helps our thinking if we have a concrete example implementation of the interface as well:

public class TestCharPump : IStreamingDataPump<char>
{
  string data = "ABCDEFGHIJKLMNO";
  int index = 0;
  public char GetNextData()
  {
    return data[index++];
  }
  public bool EOF
  {
    get { return index >= data.Length; }
  }
}

Obviously, this data pump will deliver the sorted characters A to O (in order) to a consumer, throwing an error if trying read past the last character.

The problem with inserting already sorted data

The standard way of inserting data into a binary tree:

public static Node<T> Insert( Node<T> tree, T data )
{
  while ( true )
  switch ( Math.Sign( tree.Data.CompareTo( data ) ) )
  {
    case 1:
      if ( tree.Left == null )
        return tree.Left = new Node<T>( data );
      tree = tree.Left;
      break;
    case -1:
      if ( tree.Right == null )
        return tree.Right = new Node<T>( data );
      tree = tree.Right;
      break;
    case 0:
      throw new Exception( "Duplicate keys!" );
  }
}

If we read all the data from our stream containing sorted data and use the method above:

Node<T> tree = _pump.EOF ? null : new Node<T>( _pump.GetNextData() );
while( !_pump.EOF )
  Node<T>.Insert( tree, _pump.GetNextData() );

…then we effectively end up with a linked list instead of a binary tree:

A
B
C
D
E
F
G
H
I
J
K
L
M
N
O

The standard(?) solution

So, what I’ve did in my record register application was something like this:

public Node<T> CreateTreeFromList( List<T> list, int left, int right )
{
  if ( left > right )
    return null;
  int middle = (left + right) / 2;
  Node<T> tree = new Node<T>( list[middle] );
  tree.Left = CreateTreeFromList( list, left, middle - 1 );
  tree.Right = CreateTreeFromList( list, middle + 1, right );
  return tree;
}

And then made use of that function like this:

List<Song> list = new List<Song>();
while( !_pump.EOF )
  list.Add( new Song( _pump ) );
Node<Song> tree = Node<Song>.CreateTreeFromList( list, 0, list.Count - 1 );

As you can see, I first read the whole stream into memory and then I construct the tree from there.

This will construct a perfectly balanced tree and the performance will be good. So what’s the problem? Well, actually there is no real problem, it is just that I feel that first storing the data in a vector, when it is a tree we want is an unnecessary step!!!

Another approach

So, given ABCDEFGHIJKLMNO, I want to read one character at a time and place it at the correct position in the balanced tree:

H
/ \
D L
/ \ / \
B F J N

/

\

/

\

/

\

/

\
A C E G I K M O

And again, the number of nodes must not be known in advance!

So let’s think about it. Reading A into a node seems uncomplicated. Then we should read B and somehow realize that it must be placed in such a manner that it will become the new tree root and that the previous tree root should become the left node of the new root.

Reading C and placing it as the right node of the current tree root seems fair enough, and the next node D should be handled similar to the B node… but now it gets strange, the E node should be placed in a void because F hasn’t been seen yet… and what if there IS no F node? If E is the last element in the stream it should come directly under D but not if the F appears.

And thinking about what will happen after the G is read seems even stranger, although there clearly isa pattern here if you say “EFG” will be handled exactly like “ABC”, and in fact, B and F behaves similar… so… what makes A and E alike… aaaa….

This is were I used to fall asleep. 🙂

Up until a night last week, then my brain got bored of having to start all over again with this problem and thought that it was best to settle the thing once and for all. There was no quantum leap – just the most common tool we use every day: divide and conquer!

With the aid of first one and then two helper functions the solution was suddenly crystal clear.

Reasoning away…

Firstly, one thing that disturbs my thinking a great deal is that we always have the risk of running into EOF. We can reduce that problem somewhat by a tiny helper:

public Node<T> ReadNodeFromStream()
{
  if( _pump.EOF )
    return null;
  return new Node<T>( _pump.GetNextData() );
}

Thinking about it, what we now have is a function that can create a zero or one element sized tree. Not bad really! 😉 Also note one other important fact: we are allowed to call this many times after reaching EOF – it will return null every time.Can we extend this into a three element sized tree (=height of two)?

public Node<T> thinkingOutLoud_1()
{
  Node<T> left = ReadNodeFromStream();
  Node<T> tree = ReadNodeFromStream();
  Node<T> right = ReadNodeFromStream();
  tree.Left = left;
  tree.Right = right;
  return tree;
}

Well, not quite… this will work if there are at least two elements in the stream, but otherwise it will crash. It can be fixed very easily though:

public Node<T> thinkingOutLoud_2()
{
  Node<T> left = ReadNodeFromStream();
  Node<T> tree = ReadNodeFromStream();
  if ( tree == null )
    return left;
  Node<T> right = ReadNodeFromStream();
  tree.Left = left;
  tree.Right = right;
  return tree;
}

Spot on! What about a next step… modifying the method to create a one or two level high tree?

public Node<T> thinkingOutLoud_3( int height )
{
  if ( height == 1 )
    return ReadNodeFromStream();
  Node<T> left = ReadNodeFromStream();
  Node<T> tree = ReadNodeFromStream();
  if ( tree == null )
    return left;
  Node<T> right = ReadNodeFromStream();
  tree.Left = left;
  tree.Right = right;
  return tree;
}

This method knows how to create two different sized trees, and it has no trouble at all dealing with EOF. But wait a minute… can’t we just as well write it this way, it wouldn’t change a thing if we call it with parameter of 1 or 2:

public Node<T> thinkingOutLoud_4( int height )
{
  if ( height == 1 )
    return ReadNodeFromStream();
  Node<T> left = thinkingOutLoud_4( height-1 );
  Node<T> tree = ReadNodeFromStream();
  if ( tree == null )
    return left;
  Node<T> right = thinkingOutLoud_4( height-1 );
  tree.Left = left;
  tree.Right = right;
  return tree;
}

But what if we call it with a desired height of three… or four…? Wow! I think that we safely can rename this method:

public Node<T> CreateTreeOfDesiredHeight( int height )
{
  if ( height == 1 )
    return ReadNodeFromStream();
  Node<T> left = CreateTreeOfDesiredHeight( height-1 );
  Node<T> tree = ReadNodeFromStream();
  if ( tree == null )
    return left;
  Node<T> right = CreateTreeOfDesiredHeight( height-1 );
  tree.Left = left;
  tree.Right = right;
  return tree;
}

AHHH! Recursion. (Said like Homer Simpson.)

If we knew the length of the stream we could create our tree like this:

Node<T> tree = createTreeOfDesiredHeight(
  (
int)Math.Ceiling( Math.Log( _pump.Length ) / Math.Log( 2 ) ) );

But we don’t. So exactly how does this help us anyway? It actually helps quite a lot – we’re almost done. We just have an iterative method left. And since iterative methods are boring, I won’t say much about it:

public Node<Data> CreateBalancedTree()
{
  Node<Data> left = ReadNodeFromStream();
  for ( int height = 1 ; ; height++ )
  {
    Node<Data> tree = ReadNodeFromStream();
    if ( tree == null )
      return left;
    tree.Left = left;
    tree.Right = CreateTreeOfDesiredHeight( height );
    left = tree;
  }
}

Well, maybe I should say something…

  1. Create a “left tree” of height one and then try to tuck on a root above it.
  2. If there are no more elements, then out left tree is the tree and we´re done.
  3. Otherwise create a right tree of the same height as our left tree
  4. Name our newly created tree the new “left tree” and repeat step two

Note that the key to this solution is step 3 – creating a subtree of a desired height. The part that made me pleasantly dizzy! 🙂

But… wait a minute… is this really it? Isn’t it strange that we now use completely different pieces of code to produce the left tree and the right tree??? Indeed it is! The CreateTreeOfDesiredHeight method is only a helper method I used to arrive at the solution!!! Since our CreateBalancedTree method is obviously fully capable of of creating the left tree and there is no difference between a left tree and a right tree:

The solution

public Node<Data> CreateBalancedTree( int maxHeight )
{
  Node<Data> left = ReadNodeFromStream();
  for ( int height = 1 ; height<maxHeight ; height++ )
  {
    Node<Data> tree = ReadNodeFromStream();
    if ( tree == null )
      return left;
    tree.Left = left;
    tree.Right = CreateBalancedTree( height );
    left = tree;
  }
  return left;
}

Iteration begone! And since a caller doesn’t care about any maxHeight, we can create our final (overloaded) method as such:

public Node<Data> CreateBalancedTree()
{
  return CreateBalancedTree( int.MaxValue );
}

Again, it is so embarrassingly simple once you figured it out! And this tiny piece of (almost) trivial code eluded me for so long!

OK, so now this was really really old news I guess. And there are things to criticize as well. First of all, what’s the point really? The method of loading the stream into a vector works faster AND produces a better tree, since my solution actually isn’t balanced (it is in our case where the number of elements is 2N-1) – it just has the same height as a balanced tree. But I’m fine with that.

There seems to be a lot of misconceptions around recursion out there, like that it is slow and uses an excessive amount of stack memory. Of course, calculating Fibonacci numbers using recursion instead of iteration should only be done as an intellectual exercise. But the fact that people think that it’s slow may be due to reasoning like this:

Fact: A recursive algorithm is slower than an iterative method when an equal iterative method exist.
Conclusion: All recursive algorithms are slow.

This is of course exactly the same reasoning as:

Fact: All cheeses are round and the moon is round.
Conclusion: The moon is made out of cheese.

Well, well. I don’t know how to end this, but I’ll give it a try anyway:

Try and be nice to people, avoid eating fat, read a good book every now and then, get some walking in, and try and live together in peace and harmony with people of all creeds and nations.

Monty Python
The meaning of Life

Posted in Nostalgia, Programming | 4 Comments »

Enums and Me, part zero

Posted by Dan Byström on April 17, 2007

enum Why
{
  Does,
  It,
  Not,
  Work,
  The,
  Way,
  I,
  Want
}

I have tried to make friendship with .NET Enums for years now, with varying result. My first mishap with them was that I desperately wanted to treat all enum values in the same manner by passing them “by ref” to a function (or I would have been happy with treating them just as integers, as could be done in VB6).

For example, I would have liked to be able to write this:

public void sampleFunction( ref Enum x )
{
}

Why why = Why.Not;
sampleFunction( ref why );  // illegal -- won't compile

If you wonder, my sampleFunction is part of a serialization scheme and when the enum is stored in XML I just want to treat it as an integer.

What I currently must do is:

public int sampleFunction( int x )
{
}

Why why = Why.Not;
why = (Why)sampleFunction( (int)why );

This is so ugly that it makes me wanna puke. If you can find a better way – please let me know! (Maybe I could use pointers, but that would require me to compile with /unsafe – and that is a road I don’t wanna travel again.)

There is another case when it would have been preferable to have an “enum base class” and that’s when you declare generic classes:

class myGenericClass<T> where T : enum  // no can do

This won’t work either and after googling around a little I know for a fact that I’m not the only one who misses this. The closest we can get seems to be:

class myGenericClass<T> where T : struct, ICompareable

Another thing that has also troubled me is how to map enums to the UI (most notably ComboBoxes). But lately I have arrived at a solution that’s really promising. Not perfect yet but promising. That will be my next post. Stay tuned.

Posted in Programming | Leave a Comment »

Zip tip

Posted by Dan Byström on March 3, 2007

Have you ever thought “it would be a good thing if my application could store this data a ZIP file”? Either in order to save space or to bundle several files into one (or both). But then you have hesitated because you realized you’d have to go out and buy an expensive ZIP compression/uncompression library and then learn how to use it (and maybe encounter some bugs or discover some unwanted side effects or…) and so you’ve dropped the whole thing.

If so, this must mean that you haven’t found SharpZipLib yet! It really is great! And OpenSource on top of that!

Moving data in and out of a ZIP file is a breeze, and it works completely with streams so you can, for example, work with MemoryStreams if you don’t want to work with disk based files. Highly recommended!

Some sample code that scans a ZIP file for two particular files and processes some data:

using ( FileStream fs =new FileStream( strZipFile,FileMode.Open,FileAccess.Read ) )
using ( ZipInputStream zis = new ZipInputStream( fs ) )
   
while ( (theEntry = zis.GetNextEntry()) != null )
       
switch ( theEntry.Name.ToLower() )
        {

            case
"upgrade.bin":
                abyteUpgrader =
new byte[zis.Length];
                zis.Read( abyteUpgrader, 0, (
int)zis.Length );
                break;
            case "upgrade.txt":
                byte[] ab = new byte[zis.Length];
                zis.Read( ab, 0, (
int)zis.Length );
                string s = Encoding.Default.GetString( ab );
                if ( Regex.IsMatch( s, @"\d{4}-\d{2}-\d{2}" ) )
                    fFoundNewVersion = s.CompareTo( strMinimumDate ) > 0;
                break;
        }

Posted in Programming | Leave a Comment »

I’m not alone

Posted by Dan Byström on October 31, 2005

http://www.iunknown.com/articles/2005/10/31/do-databases-rot-the-mind

I’ve been trying to tell people for over a decade now. Here, for example.

Posted in Programming | Leave a Comment »

Visual Studio 2003 Workaround

Posted by Dan Byström on September 27, 2005

Now when we’re all anticipating Visual Studio 2005, it’s time for me to crank out a tip regarding Visual Studio 2003.

If it’s a tip concerning you depends on whether or not you’re among the unfortunate ones like me, who sometimes displeases VS in such a way that it decides to teach me a lesson – by turning most of my Forms and UserControls into regular classes so that they cannot be designed anymore!

If this hasn’t happened to you – you’re among the fortunate ones and I congratulate you. (God’s little pet, right!?)

When this disaster has struck me before I have either resumed from a backup and just moved the latest changed files back into my project, or I have tediously begun to copy the text of each damaged file to the clipboard and then deleted the file from the project, then adding a new one with the same name and finally pasting the code back into place. Not very fun at all.

Now I just discovered that there is a much easier way to recover! I just open the .csproj file in Notepad. There I’ll find XML tags like these:

<File
    RelPath = "controls\vdCalendar.cs"
    SubType = "Code"
    BuildAction = "Compile"
/>
<File
    RelPath = "controls\vdCalendar.resx"
    DependentUpon = "vdCalendar.cs"
    BuildAction = "EmbeddedResource"
/>
<File
    RelPath = "dialogs\FAbout.cs"
    SubType = "Code"
    BuildAction = "Compile"
/>
<File
    RelPath = "dialogs\FAbout.resx"
    DependentUpon = "FAbout.cs"
    BuildAction = "EmbeddedResource"
/>

By walking through the list of files and changing each into the desired “SubType” it is much easier to recover:

<File
    RelPath = "controls\vdCalendar.cs"
    SubType = "UserControl"
    BuildAction = "Compile"
/>
<File
    RelPath = "controls\vdCalendar.resx"
    DependentUpon = "vdCalendar.cs"
    BuildAction = "EmbeddedResource"
/>
<File
    RelPath = "dialogs\FAbout.cs"
    SubType = "Form"
    BuildAction = "Compile"
/>
<File
    RelPath = "dialogs\FAbout.resx"
    DependentUpon = "FAbout.cs"
    BuildAction = "EmbeddedResource"
/>

But then again, I may be the only person alive who can annoy Visual Studio 2003 to this extent in the first place!

Posted in Programming | Leave a Comment »

More on nulls

Posted by Dan Byström on April 19, 2005

I’ve previously written a couple of blog posts on an idea of mine: the possibility to have language support for avoiding null reference exceptions.

I had planned to close the matter down in an article claiming that a “with” keyword could be implemented in C# with the ability to take a different path when the “with”-expession couldn’t be evaluated because a null reference exception would occur trying to do so.

Example:

with ( Order order = a.b.c.d.e )
{
    ...
}
else
{
   // either a or b or c or d or e is null and therefore
   // it would be pointless to enter the "with" block
}

Well, now it turns out that I’m not the only one having felt the need for something like this. It is implemented in the Cω research language:

  Cω takes this one step further with the behavior of returning null instead of throwing a NullReferenceException on accessing a field or property of a nullable type whose value is null.

I knew it was a good idea!!! 🙂

Posted in Programming | Leave a Comment »

.PNG + .NET

Posted by Dan Byström on December 14, 2004

The choice of .PNG as the file format for your Windows Forms graphics is superior to .BMP, .GIF, .JPG and .ICO.

The proof of this statement is left as an exercise to the reader. (Hint: Alpha channel.)

Posted in Programming | Leave a Comment »