Sunday, September 11, 2011

A recent example of the value and purpose of unit tests.

Just last week I was approached by our BA(-ish roled individual) who was tasked with documenting our testing process, and in particular our unit testing strategy. He wanted to know what our unit tests did, and how he could express their value to the client. I had just finished identifying an interesting issue that was identified as I wrote unit tests to exercise a recent feature, and it made the perfect example of what the unit test suite did, and how it added value to the project.

A bit of background into the issue:
The business requirements were that a service would be contacted to retrieve updated events for an order. All events that weren't already associated with the order would be associated with the order. The application would report on any new events associated with the order.
*events have no distinct Key from the service, they are matched up by date/time and event code.

The code essentially did the following. *returning IEnumerable


var recentEvents = EventService.GetEventsForOrder(order.Key);
var newEvents = recentEvents.Except(order.Events);
foreach (var newEvent in newEvents)
{
    order.Events.Add(newEvent);
}
return newEvents;

Based on the requirements I scoped out several tests. Ensure that:
Given the event service returns no events, no new events are added or returned.
Given the event service returns new events and the order contains no events, all events are added and returned.
Given the event service returns new events and the order contains one with matching code and date/time, only the new unmatched events are added to the order and returned.
Given the event service returns an event with matching code but different date time, the new event is added to the order without replacing the existing event and returned.


Adding the 2nd test I was suprised to find it failed. The code is de-coupled and the EventService is mocked out to return two events. There were 2 main asserts, one to assert the order contained the new events, and the second that the return value contained the new events. The order assert showed it contained the two new orders, however the returned set had a count of 0. I'm actually pleased when a test does fail, but this was a momentary "WTF?" followed shortly by a "well Duh!". I was returning the IEnumerable from the Except() expression and I had added the items to the list driving the Except; any further iteration of the IEnumerable would see no items since matches were now in the list.

The issue was easy enough to fix with a ToList() call, and I felt it warranted a simple comment to explain why the fixed code was done that way in case someone in there later went and tried re-factoring it out to just use the enumerable.

This gave me a perfect situation to demonstrate to the BA exactly how the unit tests reflected requirements within the application, and how they served to guard those requirements from unexpected future changes. I had integrated the two tests with working code to show how the CI environment reported that all tests pass, then re-introduced the buggy assumption to show how the tests pick up the error.

The other interesting thing to note was that I also showed the BA our NCover results and was a bit surprised to see that with just the first two unit tests that block of logic was reporting back a test coverage of 100%. However I showed that the "touch" count was showing a lot of 1's indicating that a single test was touching many parts of the code only 1 time. This is a warning flag to me that the code really isn't being exercised . If I was solely concerned with test coverage percentage I could have left the tests at that, but I knew the other two scenarios were not represented so I demonstrated how the stats changed by adding them. The test coverage remained 100%, however the touch counts increased to show a minimum of 3, and an average touch count of 6 to 9. It's not a perfectly reliable indication that the coverage really is complete, but by thinking through the scenarios to exercise the code, I'm quite confident that the code is reasonably guarded from the unintentional introduction of bugs.

Wednesday, September 7, 2011

Unit Tests Aren't Expensive

A short time after developers start adopting unit tests, by this I'm meaning committing to unit tests and looking at good test coverage, grumblings will start. Someone will want to make a design change, and by doing so they break a large number of existing tests. All of the sudden it sinks in that now that there are unit tests in addition to existing production code, changes like this look twice as big or bigger. This can be a crisis point for a project. Having unit tests appears to mean that code is less flexible and change is significantly more expensive. This attitude couldn't be farther from the truth. Unit tests aren't expensive. Change is expensive.

The first thing we need to look at is why the unit tests are breaking. Often when developers start writing tests the tests can be overly brittle merely due to inexperience. In these cases it is a learning experience and hopefully the fixed up tests will be a little more robust than they were before. Still, even with well sculpted unit tests guarding the code's behaviour, developers will find reasons to justify changing that behaviour. The presence of unit tests, now broken unit tests, doesn't make that change expensive, it just makes the true cost of that change visible. Making behaviour changes to an existing code base (that has already committed to some form of acceptance testing, and likely is already out in production) is a very costly exercise. Breaking unit tests are demonstrating what the cost of that change is up-front. Without the unit tests, the deposit you pay up front for the change is smaller, but the rest of the cost is charged with interest in the forms of risk and technical debt. Unit tests show you exactly what is dependent on the behaviour as it was when you first committed to it. You have a clear map not only to fix the tests, but ensure that you can fix the code documented by the tests to suit the desired new behaviour. Without that map, you're driving blind and only regression tests are going to find the spots you miss if you're careful, or it will be your customers that find them if you're not.

So, how do you keep the cost of change under control? An obvious answer would be to identify all change before coding starts, and allowing no change until coding finishes. This would be a motto for BDUF (Big Design Up Front) advocates, but in the realm of typical business software this really doesn't work. By the time coding starts and the customer actually sees implementation they are already thinking about changes. The Agile solution is to implement only the minimum functionality you need to meet requirements as you go. Often the biggest contributor to these kinds of large behaviour changes are due to development teams trying to save time in the future by implementing code that isn't necessary now. Classic reasons include over-engineering solutions, scope creep, or committing to a particular technology/framework on the basis that it may solve problems you might encounter, just not right now. Other ways to avoid large change costs is by embracing constant re-factoring. Get in the habbit of including small improvements into your code as you implement new features. Introduce base classes for common code, delete stale code with a vengeance. Any improvement to reduce the overall size of the code base should be a very high priority. Don't leave things for later, because that is debt that starts accumulating interest as new functionality comes to depend on it, or reproduces it for the sake of consistency.

On a side note:
Writing software can be like a game of chess. You don't walk up to a match with a memorized set of moves that will guarantee victory. This isn't tic-tac-toe. You've got to be prepared to adapt to the opponent that is your customer, your manager, your budget; and you've got to be prepared to sacrifice pieces in the hopes of winning the game. A developer with over-engineered code is like a player that values each piece on the board too highly. A lot of time and sweat has been spent in the code and you cannot justify sacrificing it when the demands of the game deviate from the original plan. The result is putting even more sweat and time into the solution and the resulting choice of moves doesn't win you the game, they lose it and the customer walks away.