In my recent posting about Best Practices in Unit Testing we got to take a brief glance at code coverage. Let us continue from there and dive into the surprisingly complex and counterintuitive world of coverage metrics.
(Note that this blog post does not aim at repeating the basic definitions of code coverage or the most common coverage metrics; Wikipedia has a concise summary of the fundamental ideas.)
The Coverage Zoo
Many courses and textbooks may leave you with the impression that there are only four or five relevant, well-defined metrics for code coverage. Typically, they will enumerate the infamous quartet C0/C1/C2/C3, and maybe add a subtle hint regarding one other example metric.
It is enlightening to see how many misconceptions and pitfalls these past two sentences already contain:
- There are in fact dozens of different coverage metrics all of which can be considered relevant in one way or the other. See, for instance, this overview of coverage metrics on the BullsEye website.
- There are many synonyms for most of these metrics.
- There are also many terms that are often claimed to be synonymous… except that they just aren’t. For example, you will easily find claims that statement coverage and line coverage are one and the same thing, but this is obviously not the case: Most programming languages allow for arbitrarily many statements per source line, and hence, the two metrics can result in arbitrarily different coverage results.
- Some terms are subject to interpretation by the respective coverage tool. When talking about line coverage, for example, there are variations in what lines are counted — and which lines are to be ignored — in the overall coverage result. Just look at some example output of the GNU coverage tool gcov to get an idea of how this is a tool-specific aspect.
- Many authors and tutors prefer to stick to the terms C0/C1/C2/C3 because these technical acronyms let us assume that the corresponding metrics must be particularly well-defined, leaving no room for ambiguity. As usual with assumptions, however, the opposite is the case: The Cx acronyms are used differently by different authors. See the aforementioned BullsEye page or this article about common misconceptions regarding code coverage [German] for additional information regarding this bizarre matter.
Typical False Claims
Although their mathematical definitions are often very short, coverage metrics are not as intuitive as one might first expect.
On the other hand, people are obviously tempted to try and explain coverage in short, catchy phrases. As a consequence, there are countless statements about coverage metrics out there that simply do not hold true.
Be careful when you read or hear sentences of the following variety:
- “Statement coverage means that you have covered all statements.”
This attempt at a simple, less mathematical definition of a coverage metric contains a fundamental type error. Code coverage is a numerical property (namely a fraction in the interval [0,1], usually denoted in percent). Therefore, the above sentence is somewhat similar to this one: “Temperature means that your water is boiling.”
- “All coverage metrics are defined according to the same pattern: X coverage is the fraction of Xs covered in your test runs divided by the total number of Xs in the respective piece of code.”
One might suspect that this holds true after hearing about, say, the Cx quartet. However, this attempt at a meta-definition does not hold true in general. Condition coverage, for instance, really is about the outcomes of each condition. Depending on the respective unit-test code, the condition coverage of a function with a single condition is therefore either 0/2, 1/2 or 2/2 — as opposed to just 0/1 or 1/1. And things get even worse when we consider hybrid metrics such as condition/decision coverage or modified condition/decision coverage.
- “X coverage is stricter than Y coverage.”
This sentence may be true in many practical applications depending on X and Y, and it is therefore often heard even from highly qualified sources. But from a mathematical point of view, it is usually false. In fact, there is no total order among the different overage metrics.
Here’s a practical example for the last point:
It is a common conception that decision coverage is a much stricter metric than the very basic function coverage. And yes, x% decision coverage will be much harder to reach than x% function coverage in an “average” software project (please do not ask me for a formal definition of that one) for all suitable values of x. But now consider an extreme piece of code: a highly optimized library whose pipelining-aware developers have successfully eliminated all branching except for one single-condition decision in all 100 functions of the library.
In this extreme case, it will most likely be easier to reach 100% decision coverage than to reach 5% function coverage.
How to Avoid the Pitfalls
In order to avoid the misconceptions and pitfalls outlined above one might stick to these three fundamental hints:
- When defining a coverage goal for a piece of software, make sure that the coverage metric is defined unambiguously — and that it really corresponds to the actual metric applied by the coverage tool you plan to employ. In particular, do not assume anything about the tool without having tested it on some example code.
- When comparing different coverage metrics, always look at both typical cases and extreme cases in order to understand how the metrics really relate to each other. This will also give you an idea of how well they apply to the kind of code you expect to be written in the project.
- Think twice whenever you hear or read statements about coverage that claim universality — i.e., look out for phrases such as “all metrics”, “always stricter than” etc.