Matchmaking

Posted by Dan Byström on December 29, 2013

This post contains a synopsis of an algorithm I used to cut down execution time of the following code:

foreach (var mline in mlines)
  foreach (var xline in xlines)
    if (xline.AIds.Contains(mline.AId) && xline.BIds.Contains(mline.BId) && xline.CIds.Contains(mline.CId))
    {
      // do something useful...
      matchCount++;
    }

Given that there are 100,000 “xlines” and an average of 4 items in the AIds/BIds/CIds arrays – the execution time drops from 2 hours and 54 minutes down to 47 seconds when matching 1,000,000 “mlines”.

Since I was about to write this down anyway and it contains an unusual data structure – dictionaries of hash sets – I thought that I could just as well publish it.

Given problem: match something called “m-lines”:

#	A-Id	B-Id	C-Id	Relevant data…
0	34	56	45	…
1	12	39	776	…
2	19	56	92	…
3	576	243	646	…

with something called “x-lines”:

#	A-Ids	B-Ids	C-Ids	Relevant data…	Match with m-lines
0	12,34,59	18,39,56,89	45,78	…	0
1	12,19,59,117	39,56,243	78,646	…
2	19,34,35,576	18,56,89	45,92,776	…	0,2
3	34,35,117,576	56,243	45,78,646,776	…	0,3

The rule for a match is simple: an m-line matches an x-line if the A-Id of the m-line is present in the A-Ids of the x-line and the same for the B-Id in the B-Idsand the C-Id in the C-Ids.

This is pretty straightforward and as long as we have just a few hundred lines this will be just fine. But as the number of lines increases, this will become painfully slow. On may laptop, 1,000,000 m-lines combined with 100,000 c-lines takes almost 3 hours. And this is supposed to happen in a user interaction!

The solution I came up with is to first pre-processes the c-lines and produces the following lookups:

A Dictionary		B Dictionary		C Dictionary
key int	value HashSet<int>	key int	value HashSet<int>	key int	value HashSet<int>
12	0,1	18	0,1	45	0,2,3
19	1,2	39	0,1	78	0,1,3
34	0,2,3	56	0,1,2,3	92	2
35	2,3	89	0,2	646	1,3
59	0,1	243	1,3	776	2,3
117	1,3
576	2,3

The simple idea is that this gives all the indexes of x-lines containing a certain Id, for the A’s, B’s and C’s respectively. All the x-lines that matches a certain m-line is is then simply the intersection of these three hash sets. And this is a fast operation.

foreach (var mline in mlines)
{
  HashSet h1, h2, h3;
  if (!alookup.TryGetValue(mline.AId, out h1))
    continue;
  if (!blookup.TryGetValue(mline.BId, out h2))
    continue;
  if (!clookup.TryGetValue(mline.CId, out h3))
    continue;
  putSmallestHashSetFirst(ref h1, ref h2, ref h3);
  foreach (var idx in h1.Where(_ => h2.Contains(_) && h3.Contains(_)))
  {
    var xline = xlines[idx];
    //do something useful...
    matchCount--;
  }
}

So there it is: a nice speed up of a factor 222!

It is not only the users waiting time we have saved here – we have also saved a lot of electricity by cutting down the computing time. Fast algorithms should indeed be classified as eco-friendly! 🙂

Therefore, I propose the following New Year’s resolutions for 2014:

try to eat less meat
try to travel by public transportation whenever possible
try to write more efficient algorithms

And then we have helped a little tiny bit to save the environment! Happy New Year!

This entry was posted on December 29, 2013 at 21:54 and is filed under Programming. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Dan Byström’s Bwain

Blog without an interesting name

Recent Posts

Top Posts

Categories

Subscribe

Archive

Matchmaking

Leave a comment Cancel reply

Dan Byström’s Bwain

Blog without an interesting name

Recent Posts

Top Posts

Categories

Subscribe

Archive

Matchmaking

Share this:

Related

Leave a comment Cancel reply