To err is human…but to really foul things up, you need a computer
I’m a software developer with over 20 years of experience developing games, apps and cloud services, and in that time I’ve seen a lot of bugs. I thought it would be fun then to share a little of what’s going on behind the scenes, for example when your app “needs to restart”, or things are going so slowly you may as well send your file by pigeon. I will take you through a handful of the weird and wonderful bug types I have come across, together with the hunt to detect them as well as the perfect fix– so let’s get started!
*Payload Medium, Evilness High, Windup factor Low
Have you ever seen an error message along the lines of “Access violation, system memory may be corrupted”? Well the good news is that this is hardly ever a problem with the memory hardware, more likely it’s a bug in the code.
I was the lead coder working on a LEGO Creator game for young children when I came across the most bizarre example of this bug. The game was populated by Harry Potter themed LEGO mini-figures, and one of the scenes was set inside “Hagrid’s hut”. We put a whole bunch of characters in there; Harry, Hagrid, Ron, some rats, quite a party. Then all of a sudden some very child-unfriendly action played out. Ron’s arms got very long and thin and started stabbing through the walls of the hut, his legs bloated out, and even the hut itself got in on the action, with the walls shaking all over the place. This was more “Terminator vs. The Exorcist” than LEGO Harry Potter… a great improvement maybe but not sitting well with the target demographic…
The hunt in this example was very tricky, the bug only showed itself after the game had been running for a while, only in a “release build” (not the normal developer environment) and only cropped up on a few PCs in the office. We caught it by lying in wait with a remote debugger for the action to kick-off, then “pausing” the game with the debugger to analyse its state, which is tricky with a release build.
We got a priest in and sent a robot back from the future, sorted.
With this bug as with most others, once the problem has been understood it isn’t normally much work to fix, in this case there was a bug in some clean-up code.
It works on my machine
*Payload Low, Evilness Medium, Windup factor High
Sometimes you come across bugs that are not. It looks like a bug, smells like a bug, but turns out to be a twist of fate. These are wind-ups. The most recent wind-up I looked at followed the deployment of a new version of our Horizons web service to a staging server. The response time for the service was dire, something had to be up. The requests were coming back slowly, some slower than others, and the internet connection seemed fine. We restarted the service and things went quickly again, for a while.
After some head scratching I thought to try some of our other services, and caught them misbehaving also. So the issue wasn’t anything to do with the new version. It turned out to be an issue in the office; our office internet connection was being throttled back by our ISP so all web responses were being delayed, some more than others. This sort of coincidence happens pretty rarely, but when it does it can be fun. And don’t get me started on virus checkers!
Get a decent ISP! Get a decent virus checker!
To Err Is Human
*Payload Low-High, Evilness Low, Windup factor Low
The vast majority of bugs are silly things, slip ups that get through testing or more often than not, unusual setups or data that weren’t foreseen by the developers or the testers.
These are the sorts of things that get reported most often via technical support, of the kind where “if I do x and y then z happens and ruins my day”. The challenge is to reproduce the issue, in a debug environment if necessary.
Normally once the problem’s been found, the fix suggests itself, doesn’t it Sir Alan? … You’re fired!