An In-Depth Look at Defensive Programming

29 Oktober 2015
| |
Lesezeit: 8 Minutes

Motivation

In late September, I had the pleasure of attending the Zühlke Days 2015, a large in-house conference bringing together more than 600 colleagues from Zühlke sites all over Europe. There were many opportunities to talk to people in between sessions and after the conference dinner, and I had a lot of enlightening discussions, one of which was about the didactics of defensive programming: How would you teach the concept of defensive programming to developers who, for instance, encounter their first safety-critical project?

An important aspect that we agreed upon very quickly was this: Defensive programming is yet another concept in software development (or in life, for that matter) that is

  • easy to define in abstract terms,
  • rather easy to implement in practice once internalized,
  • not so easy to define in specific rules (beyond trivial examples), and
  • rather difficult to grasp in the beginning.

That is why I decided to write this blog post — an attempt to explain, in as concrete a fashion as I can, how defensive programming is done.

Deriving the Right Attitude

An abstract rationale of defensive programming could go like this: In many contexts (such as safety-critical systems), it does not suffice to write code in a way that only covers good-weather scenarios. Instead, when designing interfaces and implementing functionality, we should pay close attention to what could go wrong, and incorporate respective countermeasures.

The latter question is not only about possible runtime problems such as memory corruption; it particularly includes the use of your code by other people, e.g.:

  • Which parts of the interface could be misunderstood by others?
  • How could clients call your code wrongly?
  • Which pitfalls could future maintainers of your code step into?

Our goal, of course, is not to spend ages making up the most exotic error scenarios, or to clutter up the source code with as many countermeasures as possible. It should be easy to apprehend that defensive programming is most useful when performed in a risk-driven fashion. Hence, it all comes down to the following approach: Anticipate the error scenarios — particularly those involving wrong use of your code — that pose the highest risk, where risk is the combination of (a) probability of occurrence and (b) explosiveness of consequences.

Important Error Scenarios, and How to Deal with Them

As stated above, there are many ways of using code wrongly. This is simply because there are many ways of using code — and each of these can go wrong in case of a misunderstanding. Hence, the widely accepted concept of clean code is interesting with respect to defensive programming: It aims at making code easier to comprehend and thus harder to misunderstand or use wrongly. Similarly, coding standards and language restrictions such as MISRA C can help reduce the probability of code being used wrongly. In what follows, however, we want to look at specific cases instead of general concepts: What are common errors, and how can they be mitigated?

Prevent or Detect Bad Calls

The single most straightforward example of defensive programming, it seems, lies in protecting a function against invocations with invalid arguments. A common and useful strategy against such bad calls is to clearly specify the function’s interface contract and to include respective checks at the beginning of the function’s implementation. In fact, this concept is so well-established by now that many programming languages already support pre- and postconditions as fundamental parts of their syntax — e.g., the D language:

ulong factorial(ulong n)
in
{
    assert(n <= 20); // Because 20! still fits where 21! won't
}
out (result)
{
    // Postcondition checks...
}
body
{
    // Implementation...
}

It is important to note, however, that there are several other ways of calling the functions of a software component in erroneous ways — some of which often lead to much nastier bugs than invalid arguments do.

One particularly important aspect to consider is concurrency: Is your function reentrant, and if not, would you consider protecting it from being invoked several times in parallel? This can easily be achieved by mutexes. Similarly, if a function is supposed to be called from certain contexts only (e.g., only from the UI thread), this can usually be checked by an appropriate precondition. Another helpful trick in the face of concurrent code with complex control flow lies in introducing artificial restrictions; as the numbers of potentially parallel activities decrease, the resulting code will be easier to analyze for the human brain.

So far, we have only considered bad calls to single functions, but many software units (classes, modules, you name it) are stateful and have a certain lifecycle. This often implies that the functions of the item can be called in a wrong order — for instance, a client may accidentally call specific functions of a stateful object before initializing or configuring the poor thing. Analogously, a client may accidentally try to reinitialize an already initialized object. The probability for errors of this variety can be decreased drastically by a precise, well-documented interface contract, and again, adherence to this contract can be checked via preconditions in the implementation.

Intermission: Let’s Practice!

In order to exercise the above concepts in a concrete manner, let us review the following snippet of C99 code.
(For the sake of brevity, we consider the organization of the code into files or modules to be out of scope.)

/* A point in time w.r.t. the uptime of the device, in seconds. */
typedef double  Uptime_sec_t;

/* A temperature in Kelvin. */
typedef double  Temperature_K_t;

/* Record type for a single support point on a temperature profile.
 * Contract: If the valid flag equals false, no other field is to be evaluated.
 */
typedef struct {
    bool             valid;  /* Whether this entry is valid */
    Uptime_sec_t     time;   /* Time component of the point */
    Temperature_K_t  temp;   /* Temperature component... */
} TemperatureProfileSlot_t;

/* A temperature profile based on any given number of support points.
 * Invariant: nSlots equals the number of allocateds slots.
 */
typedef struct
{
   uint8_t                   nSlots;  /* The number of support point slots. */
   TemperatureProfileSlot_t  *slots;  /* The actual slots (nSlots many). */
} TemperatureProfile_t;

/* Initializes a given profile with the default two-point temperature profile
 * specified in requirement UR-132.
 *
 * Interface contract:
 * In: The caller needs to provide a properly allocated profile with at least two slots.
 * Out: The profile contains the default two-point temperature profile.
 */
void TemperatureProfile_Init(TemperatureProfile_t *profile)
{
    /* Check preconditions: */
    assert(profile != NULL);
    assert(profile->nSlots >= 2u);
    assert(profile->slots != NULL);
    /* Reset all the slots: */
    for (uint8_t i=0u; i<profile->nSlots; i++)
    {
        profile->slots[i].valid = false;
    }
    /* Climb from 20 to 100 degrees Celsius within the first minute: */
    profile->slots[0].valid = true;
    profile->slots[0].time = (Uptime_sec_t)0.0;
    profile->slots[0].temp = (Temperature_K_t)293.15;
    profile->slots[1].valid = true;
    profile->slots[1].time = (Uptime_sec_t)60.0;
    profile->slots[1].temp = (Temperature_K_t)373.15;
}

Questions of personal style aside, what do you see in the above code snippet with respect to defensive programming?

Let us first point out some good news:

  • The inline documentation provides actual information (rather than, say, simply paraphrasing the identifier of a function), but it is also brief (no novellas).
  • The inline documentation explicitly mentions invariants and interface contracts, making correct use more likely.
  • The inline documentation refers to external specifications where the details might be hard to follow otherwise.
  • The code makes use of dedicated data types for physical quantities — also in assignments –, leaving little room for any terrible misinterpretation of values.
  • The code uses helpful names and identifiers in general; readers should not be misled.
  • In the function body, each piece of code is preceded by a comment that explains the intention of that piece.
  • There are explicit precondition checks in the function, and they match the documented contract.
  • The function first invalidates all slots, thus establishing a clear starting state.

However, one could also argue that there is room for improvement:

  • Nothing is said about concurrency: Is the function reentrant, or does it need to be protected?
  • The semantics of the data structure for temperature profiles is not explained, leaving some questions. For example, are the valid support points in a profile always supposed to be kept in chronological order?
  • It is not exactly defensive to set a record’s valid flag to true before its other fields have been set.

And, of course, some things may just be a matter of taste or speculation:

  • Readers may easily be tricked into believing that the field name „temp“ (in the support point record) stands for „temporary“. On the other hand, this is unlikely given the dedicated data type.
  • The record type also contains a field with a very generic name: „time“. Maybe we could be more specific? On the other hand, we are talking about a point in time, and the dedicated data type already specifies the unit and the origin of measure.
  • The magic values 293.15 and 373.15 in the function’s body may be somewhat hard to maintain. Why don’t we make it explicit that they stem from 20 and 100 centigrade, for example by a simple conversion function or properly named constants?
  • The last six lines of the function body constitute a case of code duplication. This may be a good moment to introduce a better mechanism for assigning values to a support point, e.g., an assignment function that encapsulates this functionality and also handles the valid flag in a graceful fashion.

Don’t Mislead Future Maintainers

Software maintenance comprises another typical way in which code is used (beyond invocation): Future maintainers may need to modify the code due to changing requirements, bug reports or redesigns. As a consequence, we should not only make the interfaces of our software units as hard to misunderstand as possible, but apply exactly the same general rule to their implementation.

One of the most important rules of defensive programming that I can think of in the context of software maintenance is that the intention behind every reasonably sized piece of code must be clear. Otherwise, a future maintainer might replace such a piece of code by something that they deem more elegant or less error-prone, but that simply is not semantically equivalent — introducing bugs into a previously flawless implementation. Clear intentions are especially relevant in the face of non-trivial case distinctions, complex expressions, complicated algorithms or data structures, and uncommon language concepts.

It is easy to see that the above rule can be applied to larger software artifacts, too: What exactly is the intention of a given function, what the purpose of the package?

Intermission: Let’s Practice Again!

Consider the following C function:

static void fill(int *numbers, int k)
{
    assert(numbers != NULL);
    for (int i=0; i<k; i++)
        numbers[i] = i;
    for (int i=1; i<k; i++)
    {
        int j = rand() % (i+1);
        int tmp = numbers[i];
        numbers[i] = numbers[j];
        numbers[j] = tmp;
    }
}

And now compare it to this implementation:

/* Stores a random permutation of [0,..,nNumbers-1] in an array
 * of nNumbers many integers.
 */
static void CreateRandomPermutation(int numbers[], int nNumbers)
{
    /* Check preconditions: */
    assert(numbers != NULL);
    assert(nNumbers >= 0);
    /* Initialize array to [0,1,..,nNumbers-1]: */
    for (int i=0; i<nNumbers; i++)
        numbers[i] = i;
    /* For each index, swap the item at that index with the item at a
     * random position [0..index]. (Hint: No effect for index 0.) */
    for (int src=1; src<nNumbers; src++)
    {
        const int dst = GetRandomNumberIn(0, src);
        SwapNumbers(numbers, src, dst);
    }
}

Which of the two functions would you prefer to maintain, and why?

In order to make things a bit more interesting, I won’t provide my own opinion on the matter, but the answer a Zühlke colleague gave me after reviewing these two pieces of made-up code:

  • The first example clearly suffers from poor nomenclature and lack of documentation. Probably the worst name is „k“ — even „n“ would have been more intuitive.
  • Without any inline comment, it may seem like a mistake for the second loop to start at 1… although it is not.
  • The second example illustrates very well how proper naming and how the extraction of elementary functions make code easier to understand.
  • The second assertion (nNumbers >= 0) is not required on a purely technical level, but may help uncover other problems.
  • Clearly, the second implementation is far easier to read, but it lacks a true description of the how and why: Each inline comment states what the following lines are supposed to do, but it remains a bit of a mystery why these steps are taken.

The last remark is particularly important: Given a perfect random number generator, the function will create each permutation with exactly the same probability. It is rather unfortunate that this important fact remains concealed in the inline documentation.

Concluding Remarks

You may have noted, attentive reader, that we restricted ourselves to the two general subjects of bad invocation and tricky maintenance in the above account. So what about memory corruption or hardware defects? Well, those are interesting aspects, but two facts made me leave them out in this post. Firstly, the countermeasures are rather straightforward (checksums or even self-correcting codes for memory corruption, appropriate self-tests for hardware defects). Secondly, and this is more important, they are discussed much more often and hardly ever forgotten — at least in my experience. As a consequence, I recommend that we put some emphasis on the more subtle problems of bad invocation or tricky maintenance.

Kommentare (0)

×

Updates

Schreiben Sie sich jetzt ein für unsere zwei-wöchentlichen Updates per E-Mail.

This field is required
This field is required
This field is required

Mich interessiert

Select at least one category
You were signed up successfully.