IWOMM (It Works On My Machine)!


Say you have a test suite for your project. Go ahead, say it… right now out loud so a random passerby can hear you and begin rumors about the person muttering nonsense to themselves. Now say the test suite is all nice and well factored giving you 80+% coverage on your project. Say you’re proud of the test suite and then say you add a new test case to it that works beautifully until it makes its way over to either the build server or a colleague’s workstation. Hi, I’m Cliff. You’re probably here because you have a broken test case. I’m here because I’ve been there and done that.

I was recently in a discussion with a coworker when this topic came up and I thought I’d post a little something about it. The frustration most people feel from that random test failure that always works so well on my machine is that it seems incredibly hard to track down the problem. If you know a couple of basic principles, however, it can make finding and fixing these failures as easy as… as easy as the Kool-Aid man bustin’ through your living room wall. “Oh yeah!” Technically, forcing a glass pitcher full of red liquid through plaster (or concrete like in the 80s) without spilling a drop is not easy but I’l leave that as a physics exercise for the more adept among us.

“But it works on my machine!!!”

…they always say in disgust as the green bar flips to a shade of crimson indicating they are not as careful as they claimed to be during the pull request. It’s okay though. It happens when you make a change like adding a new test or adding logic to an existing test which creates a random failure in a test case that has absolutely nothing to do with your changes. The problem many times is not the fault of the additional test or test mods that seem to trigger the failure. The cause is almost always one of two things, which I’ll elaborate on now.

Check your environment!
Many test cases have a tendency to rely on a particular environment. Maybe it’s an environmet variable, a version of the Runtime (nodeJS, Java, Ruby, etc.), or even a compiler optimization. If the environment where the failure occurs differs even slightly from yours then it’s highly possible that the slight variation is triggering the failure. I recall a Java test which ran a version of Tomcat failed because of the ImageIO library which was (or was not) present on the build server. The JVM was 1 point revision ahead (or behind, I can’t recall exactly) my local version. There were times that I discovered the presence of an executable in the system’s $PATH environment variable would cause it to be picked up during the test and dramatically change the behavior. This can happen when you run on OS X vs. Linux.

Even something as subtle as a configuration file difference can have an affect. Some applications make use of local machine specific configuration files which are used to locate things like a database, an XML parser, or a virtual machine. The database itself is considered an environmental resource. Though you should never have unit tests which depend on a database, many people make a habit of it. Anything coming from a local or external database should always be considered first when trying to isolate a failure.

Order of Operations
These are my favorite problems because they’re dirt simple to fix! In a perfect world, your test cases should be able to run regardless of which order they execute in. However, in practice, this is often not the case. Check the actual order your test cases are run in. If the failing test passes when run on its own or when run in a different order, then the failing test itself may not even be the problem. The problem is in a test that is running before the failing test. Consider the following:

Your machine
A, B, C, D, E, F -> ALL PASS

OTHER machine
F, E, C, D, B, A -> D FAILS

The problem in this scenario is immediately obvious to me and would take me all of 5 minutes of dev time to fix. Can you identify what to fix?

Either test cases E or F is causing a failure in D because D runs successfully when they are not run first. I would run F before D then E before D to see which of these (possibly both of these) cause the failure. Once I’ve identified the offending test case I would simply add a tearDown step, which reverted any state established during either the setup or the test cases. Usually this is a very easy pattern of matching any file, database, or socket open cal with a corresponding close in the teardown. It’s followed by nulling out any global variables, and resetting all mock objects. On very rare occasions I have to tilt my head and squint a little harder to find the problem.

Take the above example. Let’s say running test D passes when it runs after E. Immediately I know that test F is the offender. I would run F, D to confirm. I would then open up F and look for any global variable use. In Java this would mean anything that is static either in the test case, the tested source, or any test helper files. I would make sure these global variables were set to null in F’s tear down method. (If F doesn’t have a tearDown method I would add it.) I would then look to see if there were any mock objects used in F. Failure to reset and then even null out mock objects can trigger problems because sometimes mock frameworks can make use of globals under the covers.

Lastly, I would look for any external resources used such as files, databases, and/or network connections. You should never use these in a unit test case but there are times when you feel the need to include them. I would reset the state of any external resources in tearDown then re-run the offending tests in the problematic order. Usually I don’t need to look any further and all my problems go away, but on occasion…

Concurrency
Test cases which make use of concurrency are the most difficult to fix. Usually I use a recipe of unwinding the concurrency and making the test synchronous. This is an exercise I’ll have to explain in another post. In short, it means decoupling the concurrent piece of your logic and testing it separately. This part is usually in the form of a callback, but again, I’ll have to cover that separately. (It’s a little more involved.) In fact, this is the only true way to fix concurrency triggered failures. Every other approach is merely a hack. Many people do tricky things like adding sleeps/delays and timers. Some people go as far as to move the asserts out of the concurrent block of code! This is extremely inappropriate because you actually change the nature of what you are testing. If you can’t take the time to unwind the concurrent piece you’re probably better off removing the test case entirely. It’s not entirely right but a missing test case is always better than a faulty test case.

That’s all I have for now. I hope this long rant makes sense to somebody and/or saves time debugging. In summary you should follow the principles of testing code at all times.

  • Never let your tests depend on an external resource like a file, network connection, or database.
  • Do not use global variables in your test code or your production code.
  • decouple your concurrent logic from the place it’s being called from. (This actually means refrain from using inline callbacks and/or anonymous inner classes or lambda expressions.)

IF you follow these rules and also make sure you regularly run your tests in random order ad code a decent tearDown method in each test then you should rarely, if ever, experience an IWOMM error.