Category Archives: Uncategorized

Deduplication. Or: How I learned to stop worrying and love the HashSet<T>

Let’s say, for example, you have a relatively large set of data.
For our example – it’s about a half million records. They’re all in memory, and you used CsvHelper to load them, because you don’t like reinventing the wheel. Especially a very well thought out, tried, and proven wheel.
(Sidenote: Trying to do a .Take(BatchSize) on the CsvHelper.Read() method caused worlds of pain for me on rusty disks. YMMV)

So, tangent aside: You have 500,000 in-memory objects. These objects were read from an unreliable source, so there may be duplicates. Duplicates are bad, so you want to remove them.

Option 1:
Create a new list, and manually add records from the original list, as long as they aren’t already in the new list (using your IEqualityComparer<T> or whatever).
This is great, up until you hit about 50k records and suddenly you’re scanning a huge list for duplicates for every additional record. This was pretty fast up to about 10k records. Speed: On^2. I gave up after about 180 seconds on a 3.7ghz i7 with SSD and 16gb of memory. It was <200k records into it.

Option 2:
This sounds great at first – until you realize that it’s doing the same thing as Option 1. Again, I killed the process after a few minutes of waiting. It’s entirely possible that I somehow implemented this wrong (apparently you need to manually have IEquatable<T> implemented – I had only done an override on .Equals and .GetHashCode)

Option 3:
new HashSet<T>(IEnumerable<T>, IEqualityComparer<T>)
No lie, this bad boy did the job in a fraction of a second.
Removed 297143 duplicates from array size 454002. Final size: 156859. Elapsed: 00:00:00.2096235

Now, my EqualityComparer was only looking at some fields (a few properties combine to yield a “unique” data record) – and the way the HashSet<T>(IEnumerable<T>, IEqualityComparer<T>) constructor works from a layman’s perspective – is that it builds up an adequately sized hashtable, and starts inserting items into it (top to bottom from the original list). If it encounters an item that’s already been inserted (pretty sure this is a HashTable under the covers – works like a Dictionary where the object’s HashCode is the key… only faster), it simply skips adding any records that have matching HashCodes.

So, have some code:

public class Deduper
	public FancyData[] Dedupe(IEnumerable fancyDataArray)
		var instantArray = fancyDataArray.ToArray(); //Multiple enumeration of IEnumerable?
		HashSet uniques = new HashSet(instantArray, new FancyDataEqualityComparer());
		var array = uniques.Distinct().ToArray(); //.Distinct() does nothing. I just like it there.
		return array;
	public class FancyDataEqualityComparer : IEqualityComparer
		public bool Equals(FancyData x, FancyData y)
			if (x == null && y == null) return true;
			if (x == null || y == null) return false;
				x.KeyProperty1 ==y.KeyProperty1
				&& x.KeyProperty2 == y.KeyProperty2
				&& x.KeyProperty3 == y.KeyProperty3
				&& x.KeyProperty4 == y.KeyProperty4;

		public int GetHashCode(FancyData obj)
			var name = typeof (FancyData).Name;
			if (obj == null) return name.GetHashCode();
				string.Format("{0}:{1}:{2}:{3}:{4}", name, obj.KeyProperty1, obj.KeyProperty2,
							  obj.KeyProperty3, obj.KeyProperty4).GetHashCode();

Seriously… 210 milliseconds!

Tagged , , , ,

Posting complex object to MVC3 controller using jquery

Not as easy as it should be. I fought with this for way too long.
Basically, you should be able to construct the object in the $.post function call. But, for whatever reason, you cannot. Well, I couldn’t – your luck might be different.

this fails:

$.post("/SomeController/ActionThatAcceptsPost", {
Property1: someValue,
Property2: 3.0
}, function(response) {
//Make happy noises

but this works:

var thisIsTheParameterName = {
Property1: someValue,
Property2: 3.0
$.post("/SomeController/ActionThatAcceptsPost", thisIsTheParameterName
function(response) {
//moar happy noises

So, in short – make a variable to pass in a complex object to your MVC3 controller and it starts working.

Oh, and not sure if it matters, but the variable was the same name as the parameter on the Action, and the class was all [DataContract] decorated. Those two may be overkill, but I didn’t feel like taking them out to check.


Circuit Breaker pattern

So I was talking with a coworker and we were discussing this occasional bug we get wherein a system that processes a massive number of requests (very quickly) will sometimes lose connection to certain databases (luckily, not our logging database).

However, the fact that it can still talk to logging means that within a few seconds we may have hundreds of thousands of error records being written. Combine this with the fact that our logging system is global for all our applications, and suddenly you have the entire system being brought down (since it cannot connect to the logging db, which is expected to “always be up”) simply because a small component with high activity has an error.

Enter the Circuit breaker pattern. Most developers have probably already imagined something of this nature – but it’s basically coding to watch for X number of errors in Y seconds, and triggering an “offline” status so that any further requests are queued until the service is back up. 

Part 2 is that you have a small trickle of requests (1 per 5 seconds, 1 per 30 seconds) that are allowed through [Or alternately, you put the code into an alternate track where a specific type of request is made to check status of the ‘downed’ service]. Once the service comes back up, you change the status back to good and the flow of requests resumes.

… I might consider slowly increasing the flow of requests so that you don’t hammer it and put it back out of service again as soon as it comes up. But that’s just me.

Anyhow, that’s the circuit breaker in a nutshell.

The end result is that you have fewer erroneous calls. The only real drawbacks I see are potentially longer delay before you know the service is back up, and it’s a bit more to code.



Tagged , ,

Migrating Shelvesets between branches in TFS

So, if you work on a branch-per-release cycle, sometimes things take longer than expected.
Here’s a quick way to move a shelveset from one branch to another:
tfpt unshelve - Unshelve into workspace with pending changes

Allows a shelveset to be unshelved into a workspace with pending changes.
Merges content between local and shelved changes. Allows migration of shelved
changes from one branch into another by rewriting server paths.

Usage: tfpt unshelve [shelvesetname[;username]] [/nobackup]
[/migrate /source:serverpath /target:serverpath]

shelvesetname The name of the shelveset to unshelve
/nobackup Skip the creation of a backup shelveset
/migrate Rewrite the server paths of the shelved items
(for example to unshelve into another branch)
/source:serverpath Source location for path rewrite (supply with /migrate)
/target:serverpath Target location for path rewrite (supply with /migrate)

Serverpath is $/Project/DirectoryStructure

Don’t use nobackup. Bad idea. Backups never hurt anybody. Also, you should run this from within a directory that you’ve got mapped (source directory works great), otherwise you might have an issue with TFPT not being able to figure out which workspace, even if there’s only one.

Tagged , ,

HttpHandler won’t register

Or at least, the HttpHandler code won’t run..

If you have an HttpHandler in some semi-legacy code that should be firing, but instead you’re getting a 404 … well, you might be on IIS7 or Windows 7… Try using rather than … thanks go to


Yes/No options for UIAlertView

Be sure to keep a reference to the UIAlertView somewhere else, otherwise you’ll get a nullref exception when they click on it.

alert = new UIAlertView ("Storage Room", "Would you like to search the storage room for an item?", null, "Yes", new[] { "No" });
			alert.Dismissed += delegate(object alertSender, UIButtonEventArgs args) {
				if (args.ButtonIndex == 0) {
					var card = game.DrawCard ();
					if (card != null) {
						ShowItemScreen (card.Item, true);
			alert.Show ();