A Gentle Introduction to Robotics

Testing

In the “Best Practices” section of the previous chapter, I mentioned that you have a big hole in your knowledge and experience labeled “Testing.” We’re going to do something about that now.

Testing comes in many varieties and many flavors within each variety. Boris Beizer’s various books, in particular Software Testing Techniques, are probably as close to definitive as one can get, despite being three decades old; they are well worth owning, and probably should be regarded as mandatory reading for anyone aspiring to program, as opposed to code. Almost any software quality attribute can be tested – functionality, performance, resource consumption, internationalization and localization support, even maintainability – but this section will concentrate on functional testing – testing that the software does what it is supposed to do.

There are three major types of functional testing:

There are variations on the themes, of course. Instrumentation mocks might be used in a subsystem test, for example, or instrumentation jackets might be wrapped around product code. Typically, unit and subsystem testing not only is not performed with the software in situ in a deployment, but cannot be so performed. When the engine is installed in a car, you can’t measure shaft horsepower because you can’t get at the shaft: there’s a car in the way. A test harness, a structure permitting access to test points and injection of data and commands in a controlled manner, must be used.

User acceptance testing, used for bespoke (single customer, or “custom”) software, is typically a variety of system testing performed in the actual deployment environment. Typically user acceptance testing is less structured (and substantially less comprehensive) than the system testing done by the vendor: if software were an automobile, the user acceptance test would be a test drive. Test drives typically do not involve test tracks, hard braking, maximum acceleration, skid steering, drifting, 180° and 360° power spins, three continuous hours of operation at the engine redline, and other aspects the vendor tested – or at least should have tested.

An initially puzzling aspect of testing is that a number of well-tested, functionally correct components added together does not necessarily yield a functionally correct system. Just because all the pieces work separately does not mean they will work with one another. Any assemblage of components has emergent behavior (there’s that “emergent behavior” thing again). Just as a Ford fuel pump will probably not work in a Toyota car – despite neither the fuel pump nor the car being defective in any way – that two pieces of software work in isolation does not mean they will work together. Emergent behavior is delicate, and undesirable emergent behavior is frequently astonishing in its creativity.

Unit Testing

Unit testing is exercising a unit of code (typically a single class) in isolation from other code. Now, it is not really in isolation from other code: the unit test driver and the unit test framework are other code, and interact with the unit of code under test. But the unit of code under test is isolated from other product code: it is not sheltered by other product code protecting its inputs, and it cannot offload its work to other product code. In some respects, unit testing is like a job interview: typically, job interviews are for individuals, not entire teams. A unit test is a very thorough, very detailed job interview for the job of being depended upon by the rest of the product.

There are multiple goals often ascribed to unit tests. Tests might be tasked with executing every line of code in the unit under test, impacting a metric called code coverage. Tests might be tasked with exercising every outcome of every decision point in the unit under test, impacting a metric called branch coverage. These are good things to do, and the coverage metrics are both useful and meaningful. But these are actually not goals: they are constraints. Further, these are white box tests: you write the tests with full knowledge of exactly how the unit under test works. You are testing the implementation, not the function. Unit testing traditionally uses black box tests: they test the interface – the usage of the unit under test – without regard to how the interface is implemented.

So typically the goal of unit testing is to document and enforce the behavioral contract of the unit under test, to describe and check every designed, observable behavior of the unit under test – and not to check accidental behavior. When the test puts x, y, and z into the collection, the collection enumerator returns x, y, and z in some order, and the unit test very carefully does not assume an ordering. This leaves the implementation of the class – the “this is how this class does things” – completely unconstrained, while enforcing the “this is what this class does” behavioral contract.

There are several advantages in documenting and enforcing the behavioral contract:

Doing behavioral testing both is mind-numbingly tedious and requires creativity, which is a bad combination when humans are involved: tests are never exhaustive, even when exhaustive is intended. A simple boundary test – say, a range test on an int – would be exhaustive were every possible int value checked, but that would take a long time and a great deal of typing. (Unit tests typically use literals and no loops or computation: so that exhaustive boundary test has 4.3 billion Asserts in a row.) So boundary tests are typically “one less than the minimum gets rejected, the minimum and minimum plus 1 get accepted, the maximum minus 1 and the maximum get accepted, one over the maximum gets rejected, and we call it good.” Behavioral testing also requires scaffolding in the form of test drivers (methods that invoke the unit under test) and what are called “mocks” or “shims” (code that replaces the product code invoked by the unit under test). The needs to observe behavior, to stimulate behavior, and to inject mocks into the code under test also has implications as to how the unit under test must be designed: it must be designed to be testable. Testability typically involves the creation of test points (where behavior can be observed), data injection points (where behavior can be stimulated), and an arrangement for dependency injection (where mocks can be introduced).

.NET includes support for unit testing (among other things). One type of a .NET project is a test project. Two popular unit test frameworks are MSTest and NUnit, and .NET includes support for both. Both MSTest and NUnit are valuable, but they have a major philosophical difference. In MSTest, you have access to the class’s internals, and you typically test whether the class works the way the design says it should; in NUnit, you typically do not have access to the class’s internals, and you typically test whether the class acts the way the specification says it should. MSTest’s focus is on white box testing, where the test concentrates on the fidelity of the translation from design into code. NUnit’s focus is on black box testing, where the test concentrates on the fidelity of the translation from requirements into code. Each has its place; but they are not synonyms. Here, we are going to use NUnit.

NUnit has an overall approach as to how testing progresses, and some specialized vocabulary associated with that. Typically a test project consists of an assembly containing tests, and other separate assemblies that contain the classes from which objects are to be exercised. A test runner selects tests to be run – by default, all of them – and orchestrates running the tests. The individual classes of the assembly containing tests are called fixtures: these are organized by the fixture’s namespaces. Within each fixture namespace, there can be a setup fixture which creates the environment needed for tests within that namespace and does teardown of that environment when the tests are complete; and there are potentially many test fixtures which contain individual tests. Each test fixture can have its own setup and teardown methods, which run before any tests and after all tests in the fixture, to establish and remove any specialized environment needed by the test fixture as a whole. Further, each test fixture can contain test setup and teardown methods, which are run before and after each individual test in the fixture, to establish and remove any environment needed by the individual tests. When a single test needs a particular unique environment, the test itself is in charge of setting up and tearing down that environment. Finally, each test fixture has potentially many tests, which create and exercise instances of the classes under test. All of these special purpose constructs are identified by applying attributes to the appropriate classes and methods: [SetupFixture] and [TestFixture] for the classes, [OneTimeSetup] and [OneTimeTeardown] for the fixture setup and teardown methods, [Setup] and [TearDown] for the per-test setup and teardown methods, and [Test] for the test methods themselves.

The setup and teardown facilities are typically not used for exercising classes and methods that are more or less self-contained and do not interact with their environment: unit testing anything in System.Math or System.Collections.Generic, for example, would not require setting up or tearing down anything. But a class that, for example, queries a database would need some swaddling with setup and teardown methods: its fixture might need to create a test database with known contents, and certainly would want to direct the class’s attention to the test database rather than the production database. (Unless, of course, you want to fabricate a bestselling book.)

There are some conventions around unit tests. Unit tests are typically supposed to be independent of one another: the order in which they run should not matter, and whether any particular test does or does not run should not affect the other tests. Tests’ environments are set up by the set up methods, not other tests. Tests should create a new instance of the class under test, set its state as necessary, test a single behavior, and discard the instance when they have checked the behavior. Tests for a particular class are typically gathered themselves into a single test fixture associated with that class. Each test checks a single assertion about the class under test. There are both “positive” and “negative” tests: a “positive” test checks that some statement about the class under test is true, and a “negative” test checks that some statement about the class under test is false. Tests are said to “pass” when their assertion is true and “fail” when their assertion is false. (Note: the assertion, not the statement about the class under test. You can assert a negative: “this class does not use reference semantics” will pass by demonstrating a lack of reference semantics.) Typically every method, every property, every constructor, every way there is of interacting with the class has at least one dedicated test. Combining multiple tests for a single method or property into an omnibus test method is typically permissible, but also typically frowned upon: that a property both rejects null or out of range values and publishes property change events really should be two assertions and two test methods.

With all that in mind, let’s unit test a very simple class. Create a directory somewhere, and in that directory create a Library directory and a sibling Library.Test directory. In the Library directory, do:

dotnet new classlib
dotnet add reference path to BrianHetrick.Lib.General

and in the Library.Test directory, do:

dotnet new nunit
dotnet add reference ../Library
dotnet add reference path to BrianHetrick.Lib.General

This sets up a Library directory that will contain classes – the classes we will test – and a Library.Test directory that will contain test fixtures for the classes in Library. Both the classes and the tests will have access to the BrianHetrick.Lib.General library, with all the goodness that entails.

Now, let’s make a class to test. In Library, create a SimpleContainer.cs class:

using System;
using BrianHetrick.Lib.General;

namespace BrianHetrick.Example.Testing
{
    public class SimpleContainer <T> : ClassBase
    {
        private T _value;

        public SimpleContainer (T value)
        {
            _value = value;
        }

        public T Value
        {
            get { return _value; }
        }
    }
}

All this class does is hold a value supplied to its constructor, and return that value when the Value property is evaluated. It’s hard to get a class much simpler that this. Nevertheless, this class has a defect: do you see it? Worry not, the unit test will find it.

It is both traditional and good practice to have the unit tests follow the tested classes in inheritance and generic variations. Now, SimpleContainer inherits from ClassBase, so our TestSimpleContainer test fixture should derive from a TestClassBase class. Since tests need a little bit of specialized functionality – things like acquiring test object, getting rid of test objects, and so forth – we will also have a TestBase class that all the test fixtures derive from. Let’s start with that one. In Library.Test, create a TestBase.cs class:

using System;
using BrianHetrick.Lib.General;
using NUnit.Framework;

namespace BrianHetrick.Example.Testing.Test
{
    /// <summary>
    /// An abstract base class for all NUnit tests for application classes.
    /// </summary>
    public abstract class TestBase : ClassBase
    {
        #region Constructors
        /// <summary>
        /// The constructor.
        /// </summary>
        protected TestBase ()
        {
            ConstructorExit (typeof (TestBase));
        }
        #endregion

        #region Non-Public Methods and Properties
        /// <summary>
        /// Dispose of an IDisposable test object instance.
        /// </summary>
        protected void Discard (object testInstance)
        {
            if (testInstance is IDisposable disposable)
            {
                disposable.Dispose ();
            }
            MethodExit (typeof (TestBase), nameof (Discard));
        }

        /// <summary>
        /// Get a new instance of the object to test.
        /// </summary>
        protected abstract object GetTestInstance ();
        #endregion

        #region Public Methods and Properties
        /// <summary>
        /// A method executed before every test.
        /// </summary>
        [SetUp]
        public virtual void SetupTest ()
        {
            MethodExit (typeof (TestBase), nameof (SetupTest));
        }

        /// <summary>
        /// A method executed after every test.
        /// </summary>
        [TearDown]
        public virtual void TeardownTest ()
        {
            MethodExit (typeof (TestBase), nameof (TeardownTest));
        }
        #endregion
    }
}

(This is the actual base class for the unit tests for the BrianHetrick.Lib.General library; only the namespace has been changed to protect the innocent.) This class simply provides functions typically needed in tests: a way to get a test instance, discard test instances (the void Discard method; the test instance might be IDisposable, and we mustn’t forget to call Dispose on it), and do per-test test setup and teardown if needed (the virtual void SetupTest and virtual void TeardownTest methods).

Also note that this is a class, not a test fixture. This class is abstract: it cannot be a test fixture, as only concrete classes can be test fixtures. This class merely supplies resources to its descendants, which can be test fixtures.

Using this class as a base, we can derive a TestClassBase class that tests ClassBase descendants:

using System;
using BrianHetrick.Lib.General;
using NUnit.Framework;

namespace BrianHetrick.Example.Testing.Test
{
    /// <summary>
    /// An abstract base class for all NUnit tests for ClassBase-derived application classes.
    /// </summary>
    public abstract class TestClassBase : TestBase
    {
        #region Constructors
        /// <summary>
        /// The constructor.
        /// </summary>
        public TestClassBase ()
        {
            ConstructorExit (typeof (TestClassBase));
        }
        #endregion

        #region TestBase Overrides
        /// <summary>
        /// Get a new instance of the object to test.
        /// </summary>
        protected sealed override object GetTestInstance ()
        {
            return GetTestInstance (0);
        }
        #endregion

        #region Non-Public Methods and Properties
        /// <summary>
        /// Get a new instance of the object to test.
        /// </summary>
        /// <param name="valueIndex">
        /// An index of a set of values to be assigned to the instance of the object to test. Must
        /// accept at least 0 and 1.
        /// </param>
        /// <remarks>
        /// This method must produce at least two distinct values (with indices 0 and 1) and must
        /// produce a new instance at each invocation: thus two sequential invocations with
        /// <paramref name="valueIndex" /> 0 must produce two distinct objects with identical
        /// values: they must be Equal (,) but not ReferenceEqual (,), and two sequential
        /// invocations with differing <paramref name="valueIndex" /> values must produce objects
        /// that do not compare Equal (,).
        /// </remarks>
        protected abstract object GetTestInstance (int valueIndex);
        #endregion

        #region Method and Property Tests
        /// <summary>
        /// Assertion: Equals () returns <c>true</c> when the object is compared to itself, and
        /// <c>false</c> when compared to <c>null</c> or an arbitrary object.
        /// </summary>
        [Test]
        public void TestClassBaseEquals1 ()
        {
            object testObject = GetTestInstance (0);
            Assert.AreEqual (testObject, testObject);
            Assert.AreNotEqual (testObject, null);
            Assert.AreNotEqual (testObject, new object ());
            Discard (testObject);
            MethodExit (typeof (TestClassBase), nameof (TestClassBaseEquals1));
        }

        /// <summary>
        /// Assertion: Equals is value-based, not identity-based.
        /// </summary>
        [Test]
        public void TestClassBaseEquals2 ()
        {
            object testObject1 = GetTestInstance (0);
            object testObject2 = GetTestInstance (0);
            Assert.AreEqual (testObject1, testObject2);
            Discard (testObject2);
            testObject2 = GetTestInstance (1);
            Assert.AreNotEqual (testObject1, testObject2);
            Discard (testObject1);
            Discard (testObject2);
            MethodExit (typeof (TestClassBase), nameof (TestClassBaseEquals2));
        }

        /// <summary>
        /// Assertion: the ToString () method returns a string containing the simple type name of
        /// the object.
        /// </summary>
        [Test]
        public void TestClassBaseToString1 ()
        {
            object testObject = GetTestInstance (0);
            string testObjectTypeName = Log4NetHelper.GetSimpleTypeName (testObject.GetType ());
            string? testObjectToString = testObject.ToString ();
            Assert.IsTrue
               ((testObjectToString != null) &&
                testObjectToString.Contains (testObjectTypeName));
            Discard (testObject);
            MethodExit (typeof (TestClassBase), nameof (TestClassBaseToString1));
        }

        /// <summary>
        /// Assertion: the ToString () method differentiates objects by identity.
        /// </summary>
        [Test]
        public void TestClassBaseToString2 ()
        {
            object testObject1 = GetTestInstance (0);
            object testObject2 = GetTestInstance (0);
            Assert.IsFalse (string.Equals (testObject1.ToString (), testObject2.ToString ()));
            Discard (testObject1);
            Discard (testObject2);
            MethodExit (typeof (TestClassBase), nameof (TestClassBaseToString2));
        }
        #endregion
    }
}

This class (again with the protective namespaces, and again not an actual test fixture) just tests the properties ClassBase is supposed to guarantee to make sure the ClassBase-derived class didn’t break anything. It also defines a GetTestInstance (int) method so tests can get multiple different instances of the class under test with different values, and overrides and seals the TestBase definition of GetTestInstance () so deriving unit test don’t need to, and in fact cannot, redefine it.

Note that the [Test]-attributed methods – the actual tests – are each described with an assertion about the unit under test. At least in theory, if the test passes, the assertion is true – and can be depended upon. The collection of unit tested assertions about a class are the “behavioral contract” of the object. Anything not a unit tested assertion about a class is not a part of the behavioral contract. The tests themselves also use the NUnit Assert methods. These are used to check equality or inequality of actual and expected values, identity or difference of references, and so forth.

And now, we can create an actual unit test. We really should make a generic TestSimpleContainer <T> class to unit test a generic SimpleContainer <T>, and then a concrete descendant to test a constructed SimpleContainer <int> and another to test a constructed SimpleContainer <string> (because SimpleContainer can have both value and reference types for its type parameter, we really should test both), but we’ll just test SimpleContainer <int> and assume that will be good enough. So now we create a TestSimpleContainer class that is an actual test fixture:

using System;
using BrianHetrick.Lib.General;
using NUnit.Framework;

namespace BrianHetrick.Example.Testing.Test
{
    [TestFixture]
    public class TestSimpleContainer : TestClassBase
    {
        /// <summary>
        /// Get an instance of the object to test.
        /// </summary>
        /// <param name="valueIndex">
        /// An index of a set of values to be assigned to the instance of the object to test. Must
        /// accept at least 0 and 1.
        /// </param>
        protected override object GetTestInstance (int valueIndex)
        {
            return GetTypedInstance (valueIndex);
        }

        /// <summary>
        /// Get a strongly typed instance of the object to test.
        /// </summary>
        /// <param name="valueIndex">
        /// An index of a set of values to be assigned to the instance of the object to test. Must
        /// accept at least 0 and 1.
        /// </param>
        private SimpleContainer <int> GetTypedInstance (int valueIndex)
        {
            return new SimpleContainer <int> (valueIndex);
        }

        /// <summary>
        /// Assertion: the <see cref="SimpleContainer.Value" /> property has the value supplied to
        /// the constructor.
        /// </summary>
        [Test]
        public void TestConstructor ()
        {
            SimpleContainer <int> testObject = GetTypedInstance (0);
            Assert.AreEqual (testObject.Value, 0);
            Discard (testObject);
            testObject = GetTypedInstance (1);
            Assert.AreEqual (testObject.Value, 1);
            Discard (testObject);
        }
    }
}

This test fixture fits in with the classes so far by defining the GetTestInstance (int) method required by TestClassBase. It has a strongly typed GetTypedInstance so it doesn’t need to cast the GetTestInstance object to anything before accessing the methods and properties of the SimpleContainer <int> it creates. It doesn’t need to set up or tear down any environment and so doesn’t.

Now, we are ready. In Library.Test, give the command:

dotnet test

and observe!

A total of 1 test files matched the specified pattern.
Failed TestClassBaseEquals2 [156 ms]
Error Message:
    Expected: not equal to <BrianHetrick.Example.Testing.SimpleContainer<int> [3]>
But was:  <BrianHetrick.Example.Testing.SimpleContainer<int> [5]>

Stack Trace:
    at BrianHetrick.Example.Testing.Test.TestClassBase.TestClassBaseEquals2() in /media/bah/Working/Projects/BrianHetrick/Investigations/UnitTestExample/Library.Test/TestClassBase.cs:line 94

Umm, what? How could a class that just puts something in a box and then takes it out again fail?

Well, let’s look at the failing test. It’s TestClassBase.TestClassBaseEquals2, and its assertion is “Assertion: Equals is value-based, not identity-based.”. Oops. Our SimpleContainer did not override Equals, so it just inherited the ClassBase equals that says “these are the same type, and neither is null, so they are equal.” And this is the defect in the simple container class: it broke the parent class’s behavioral contract. ClassBase requires its descendants to use value semantics, and SimpleContainer does not. And this is why having the unit tests in an inheritance hierarchy similar to that of the units under test is traditional: breaking the parent class’s behavioral contract is easier than it looks.

Now, this is software: it is supposed to do exactly what we want it to do. So we have a decision to make: should SimpleContainer <T> use value or reference semantics? (Currently, it does neither: any two SimpleContainer <T> objects with the same type parameter compare Equal.) We could remove the inheritance from ClassBase on the object (and on TestClassBase on the test fixture), and inherit System.Object’s reference semantics; or we could participate in the Equals pattern ClassBase sets up and use value semantics. Or we could just ignore the issue raised by the failing test: as long as no one ever needs to tell two SimpleContainer <T> instances apart, it makes no difference. But “ever” is a long time, so being good little do bees, we decide to participate in the value semantics ClassBase sets up for us. The repair to SimpleContainer <T> is simple:

public override bool Equals (object obj)
{
    SimpleContainer <T> other = obj as SimpleContainer <T>;
    return base.Equals (obj) && _value.Equals (other.Value);
}

public override int GetHashCode ()
{
    return HashCode.Combine (base.GetHashCode (), _value);
}

And we’re done:

A total of 1 test files matched the specified pattern.

Passed!  - Failed:     0, Passed:     5, Skipped:     0, Total:     5, Duration: 124 ms - /media/bah/Working/Projects/BrianHetrick/Investigations/UnitTestExample/Library.Test/bin/Debug/net6.0/Library.Test.dll (net6.0)

Well, almost done. If you look for the log, you will find it in /home/username/.config/Microsoft Corporation/Microsoft.TestHost. The actual program we are running is Microsoft.TestHost, and ClassBase and its friends dutifully constructed the log in its program directory. This is actually correct, but not what we want. So we have to tell ClassBase and its friends to use the unit test assembly instead for information about where to put the log. First, we create the information to use with an addition to the Library.Test.csproj file:

<PropertyGroup>
  <AssemblyTitle>BrianHetrick.com Examples</AssemblyTitle>
  <AssemblyVersion>1.0.0.0</AssemblyVersion>
  <Company>BrianHetrick.com</Company>
  <Copyright>Copyright © 2021 Brian Hetrick</Copyright>
  <Description>Unit Test Examples</Description>
  <Product>Library.Test</Product>
</PropertyGroup>

Then, we tell ClassBase et al. to look there with a SetUpFixture.cs file:

using System;
using System.Reflection;
using BrianHetrick.Lib.General;
using NUnit.Framework;

namespace BrianHetrick
{
    [SetUpFixture]
    public class SetUpFixture
    {
        private Assembly _originalAttributedAssembly;

        [OneTimeSetUp]
        public void Setup ()
        {
            _originalAttributedAssembly = ProgramDataHelper.AttributedAssembly;
            ProgramDataHelper.AttributedAssembly = Assembly.GetExecutingAssembly ();
        }

        [OneTimeTearDown]
        public void TearDown ()
        {
            ProgramDataHelper.AttributedAssembly = _originalAttributedAssembly;
        }
    }
}

This cleverly changes where ClassBase and its friends look for the information used to set up logging, using an API (the AttributedAssembly property of the ProgramDataHelper static class, which is deliberately obscure) provided specifically for unit testing: an injection point. NUnit runs the OneTimeSetUp method before running any tests in the BrianHetrick namespace – which cleverly contains all our unit tests – and runs the OneTimeTearDown method after running all tests in the BrianHetrick namespace. And now, when we dotnet test, we find the log where it should be: /home/username/.config/BrianHetrick.com/Library.Test.

And note one final subtlety: SetUpFixture is not a ClassBase descendant. For what it needs to do, it cannot be: it is mucking with the logging initialization, and that has to happen before logging is configured, which has to happen before the first ClassBase descendant is created, and if SetUpFixture is a ClassBase descendant then that happened already and it’s too late.

You may note that the unit test code is substantially larger than the code of the unit under test. This is actually typical: typically, unit test code is three to ten times as large as the code being exercised. This has several causes: checking that something was done correctly is typically more detailed than actually doing the correct thing; in writing code to do something, we typically use inheritance and delegation to make the code as concise as possible given understandability, while in testing we typically make the code as understandable as possible (as it is the documentation of the behavioral contract); and testing is about as concrete as code gets, involving knowledge of the correspondence between particular known data and known good results, while functional code has parameters that are filled in at run time and produces whatever it produces. However, test code is typically much simpler than functional code: it generally consists of setting up an object, doing a single operation, and testing that the operation gave expected results.

Because of the simplicity, developing a unit test for a class generally takes about only twice the time as developing the class in the first place even though there is three to ten times as much of it. Unit test code also should, if possible, be done by a different developer than the one writing the unit to be tested (to avoid inadvertent white box testing). And the only thing a unit test ever does is catch mistakes – now and in the future. Management typically opposes unit tests “unless they are needed” because “just don’t make mistakes,” but mistakes are a fundamental aspect of human creation. Therefore, it is generally a wise idea to unit test: “well, if it doesn’t need to work, we can develop it really quickly” is only rarely what is actually desired. Although sometimes it is: that is a business decision, not a technical one.

Retrofitting unit tests to previously developed code is typically a low payoff and moderate risk activity. All code has faults, but actual failures have typically been mostly beaten out of in-production code. Unit tests will reveal faults in the thousands and tens of thousands in the code base, but repairing the faults is risky because the code is already in production. The in-production code’s behavioral contract is effectively “whatever it does now:” there will be other code depending on the faulty behavior, and there is no telling where that other code might be. Repairing faults in code changes the code’s behavior, and this behavior change may create defects – faults and possible failures – elsewhere in the code base, as has been discussed previously. Just as you should not catch exceptions if you don’t know how to fix them, there is really no point in creating a catalog of faults if you aren’t going to repair them. Accept that you have a code base without unit tests: that, too, is a business decision, and one management already made. Unit tests for code being added? Sure. Unit tests for existing deployed code? Questionable. It’s like parenting teenagers: don’t ask a question unless you’re ready to deal with the answer.

In a group project, a typical norm is for code to have to pass its unit tests before the code and the unit tests get checked in. In a group project, a typical safeguard is that a check in triggers a build of the software and the unit tests, and a run of the unit tests. (If the unit tests take a long time to run, this might occur nightly, instead of at every check in.) Having the product unit tests fail because of your check in is known as “breaking the build,” and will make you quite unpopular. Don’t do that. (Unless you’re the fair-haired, nimbus wearing, negative productivity favorite of management that everyone else carries along. Then cleaning up the trail of destruction you leave behind you is everyone else’s job, not yours. That happens too. You can tell you are the fair-haired, nimbus wearing, negative productivity favorite of management that everyone else carries along when you go back to some code you wrote six months ago, and realize you have never seen this code before: the team made sure the product healed around the destruction you wreaked. An implication of making this discovery is the team finds it easier to just redo your work than to try to teach you how to do it right in the first place: think long and hard about the implications of that, bucko.)

Mocks, Interfaces, Dependency Injection, and Testability

Code executes in an environment, and most of that environment is other product code. But unit testing exercises product components in isolation from the rest of the product. So how does that work? A Framistat may need a FramistatRotater to do its job, but FramistatRotater is another piece of the product. How do we resolve this conundrum? (Q: Who shaves the barber? A: Nobody, she doesn’t need to shave.)

The resolution to this conundrum is “mocks” and “dependency injection.”

A mock is an object that pretends to be another object: for example, a MockFramistatRotater that is defined in and used by the TestFramistat class, which implements the API of FramistatRotater but does something completely different under the covers. (Typically a mock will have a table of inputs and outputs: it looks up its inputs in the table, and returns whatever the table says are the outputs. The table is cleverly constructed to hold whatever inputs the test will cause the unit under test to present to the mock, and probably also logs all those inputs so you can white box test if you want.)

Dependency injection (DI) is getting the unit under test to use the mocks rather than the product code. This typically needs cooperation from the unit under test; this cooperation is called testability.

Testability has several practices associated with it. Firstly is a reliance upon interfaces, rather than concrete objects: the Framistat needs to accept either a FramistatRotater or a MockFramistatRotater, and the easiest way to do that is to accept an IFramistatRotater instead of either class. (Otherwise, the MockFramistatRotater needs to be a descendant of FramistatRotater, which, since the mock is replacing everything, implies that everything in a FramistatRotater is virtual, which violates the open-closed principle.) Secondly, objects need to have injection points where mocks can be introduced: the Framistat needs to have an IFramistatRotater constructor parameter that tells it what sort of framistat rotater to use, or an IFramistatRotater property that lets someone else give it the framistat rotater to use. And for usability, because of course a Framistat needing a FramistatRotater is an implementation detail, the Framistat needs to lazily initialize the framistat rotater property: if no one told it one to use, it needs to create one on its own.

A great help in dependency injection is a dependency injection framework. What a dependency injection framework does is create objects with the dependencies filled in. Typically there is something like an [Import] attribute on a constructor parameter or property, which tells the DI framework “here is an injection point.” Typically there is something like an [Exports] attribute on classes, which tells the DI framework “here is a thing intended to plug into injection points.” Or, for unit testing, you tell the framework “here is a thing to plug into an IFramistatRotater injection point.” Then you tell the DI framework to create a Framistat, and the Framistat you get back has the framistat rotater filled in already. You don’t actually need a DI framework, ever; but the alternative is to have all objects use lazy initialization to satisfy their dependencies, which is somewhat unnatural.

There are two lightweight dependency injection frameworks built in to .NET, and confusingly both are called MEF, the Managed Extensibility Framework. (The MEF currently recommended is MEF2, the System.Composition namespace, rather than MEF, the System.ComponentModel.Composition namespace. The two are similar in function and approach but differ in their detailed usages. MEF2 is a NuGet package that must be added to the project with dotnet add package Microsoft.Composition. MEF2 also uses some of MEF directly, further confusing things.) Microsoft also created the PRISM framework, which is a heavyweight and very capable dependency injection framework with additional decoupling goodies like an event broker. Both MEF and PRISM are now open source. Finally, there are a number of third party and other open source dependency injection frameworks for .NET. Which, if any, dependency injection framework to use in a project is one of those cross-cutting concerns, like the logging framework, that affects just about everything. Some DI frameworks assume a particular logger, so these two decisions in particular can interact with one another.

Testability of a class may also involve functionality not actually needed by the product of which the class is a part. Testing an object might involve examining intermediate results, or checking that process steps happen in a particular order. This can often be handled by inspecting arguments or patterns of access to per-test mocks; but they can also be handled by, for example, the object publishing events that only the unit tests subscribe to, or setting intermediate result properties that only the unit tests look at. These arrangements are called test points.

Excess functionality is an inherent security risk. The introduction of test points increases the attack surface of the object, and the introduction of injection points (“here is some code you should run or some data you should trust”) increase the attack surface of the object dramatically. This has negative implications for data security. A stumbling block to exploitation can be introduced by making test and injection points internal and identifying the unit test assembly as a friend assembly; this lets the unit tests access internal class members. protected class members can be exported to public members with a unit test-specific descendant class. The unit test and attacking software both can use reflection to access even private class members; for unit tests that is profoundly white-boxy and I don’t recommend it, but I’m sure someone somewhere says it’s the One True Way™, so your mileage may vary. All the visibility attributes do for security is slightly alter the ease of attack, not possibility of attack; so maybe making access points public is not such a big deal after all. As always, software has to try to balance competing and even conflicting goals.

Writing unit tests, especially ones that require injecting mocks of other product functions, is boring, repetitive, tedious, and dreary. It is not fun, or glamorous, or intellectually stimulating, or even particularly rewarding. So a lot of effort goes into justifying not doing it, or doing something “automatic” that is “just as good” instead.

I have no problem with tools that write unit tests: code that produces a text file which, when compiled and run, is a unit test. Maybe you feed the tool a list of class names, and it uses reflection to find read-write properties for each class, and it writes the tedious “put a value in the property, check the value in the property, check the history of event publications for the property change events” code that is basically boilerplate anyway. Wonderful idea, each of those output files is a good start on a unit test, flesh each one out and stick it in the repository.

But I have a big problem with code that does something with reflection across all the classes in the product pretending that is a unit test. The tool should write the test, not do the test. Show me the code that tests that this class in particular fulfills this assertion in particular. I am very happy that we have an audit tool that checks all our classes implement ToString; I am very unhappy that we do not have a unit test that checks Framistat.ToString produces a string starting with BrianHetrick.App.WorldDomination.Intermodal.NetworkSubversion.Framistat like it’s supposed to. I’m glad we have an audit tool that checks all our classes’ read-write properties publish property change events; but I’m unhappy we don’t have a unit test that checks the Framistat.Frobisher property rejects a frabulated Frobisher (because a Framistat requires a defrabulated Frobisher) like it’s supposed to. If you’re checking something across multiple classes, it’s not a unit test, is it? It’s an audit tool, and that is a different thing.

Unit tests are supposed to be detailed, boring, and tedious to the point of tears. Maybe some day we will have tools to automatically unit test classes, probably about the same time we have tools to automatically write the classes in the first place. But we don’t have them today and we aren’t going to anytime soon. Either do it, or don’t: but don’t do something else “just as good” and pretend it’s a unit test. It’s not a unit test; and it probably isn’t “just as good,” either.

Test Driven Development

There is a software development philosophy called “test driven development,” where the overall coding cycle is:

This is a defensible approach when the class under development is self-contained and does not interact with its environment: a class that computes something from other objects, or keeps an evanescent record of something, or something else that is local to the program being developed. But recall that in unit testing, we call code that makes the unit tests – and only the unit tests – pass a “mock.” There is a real danger that test driven development results in a program that can get through its demo, only, and fails miserably in actual use: the entire program is a mock.

A class that is supposed to access an external database should start out actually accessing an external database, not an in-memory structure that behaves like a database and has exactly enough capacity for the ten cases the unit test exercises. The environmental interaction itself, rather than the observable internal results of the environmental interactions, is the point and should be the focus.

In general, developing to the test is about as good an idea as teaching to the test: it leaves you with software (or students) that excel at answering a very specific set of questions but unable to handle anything other than that very specific set of questions. So while test directed development can be a useful conceptual framework, I do not recommend it as an operational framework.

Training the program under development to run exactly and only the test suite is in fact a thing that happens. (It happens most publicly in “AI” development, where training data keeps popping up in output because the system learned “respond to a stimulus like this with exactly this output” was what it was supposed to do.) It happens when you mistake the purpose of tests as being testing the software. Tests exercise the software, but only incidentally test it: what they are testing is the software development process. Every test failure means the process burped out incorrect software. The root cause analysis – and yes, for anything other than a typo or simple “d’oh!” oversight you should do a quick root cause analysis – should always wind up with a development process recommendation. The root cause analysis doesn’t need to be a formal board of inquiry. It can be as simple as pondering “how did the development process let that mistake get made? Could that mistake get made again? Do we need another check in our checklist? Do I need another check in my checklist?” The problem is almost always process, almost never happenstance, environment, or people.

Subsystem Testing

Subsystem testing, or integration testing, is testing that subsets of the product that are supposed to work together actually do so. Instead of testing units in isolation, it tests assemblages of units together. Typically the same test framework used for unit tests can be used for subsystem tests; after all, if the framework was capable of testing the gozintas (goes into’s) and gozoutas (goes out of’s) the individual units, it can test the gozintas and gozoutas of assemblages as well. To some extent, simply repeating the unit tests without injecting mocks is a type of integration test; but this risks ignoring any environmental effect the software is supposed to have and is generally insufficient. Typically integration tests must be separately developed.

The focus of integration testing is on the interactions among the various objects composing the application, rather than on the objects themselves, and “end to end” tests are common. Can the input massage find all the various types of inputs it is supposed to be able to find? Does the input massage pass along to the massaged input to the processor correctly? Does the processor pass along the processed, massaged input to the output processor correctly? Does the output processor correctly produce all the outputs it is supposed to produce? Do the various options and switches and dials and knobs over here on the UI or in the configuration actually affect the processing they are supposed to affect over there in the logic?

Injection points to integration tests tend to be data streams, not properties of objects; test points of integration tests tend to be databases or data streams, rather than return values of methods. Coverage analysis of integration tests tend to concentrate on major functions rather than code paths or decision points.

Subsystem testing is very aware of the structure of the subsystem being tested and the data flow through it – because that is basically what is being tested in the first place. You are not testing that the event publication mechanism works, you are testing that the circumstances presented made the right things happen – that events were used is typically irrelevant. You are not testing that the input massage correctly handles a byte order mark, you are testing that the input having a byte order mark results in the output, produced a hundred classes away, having a byte order mark, which implies the “use a byte order mark” indication did not get lost on its way through those hundred classes.

In a very real sense, subsystem testing is where you test the design of the classes you wrote: do the classes, all of which are correct to their specifications (that’s the unit tests’ work), actually implement the desired functionality? We know the over temperature alarm works, we know the temperature sensor works, but is the temperature sensor hooked up to the over temperature alarm? Does the over temperature alarm going off cause the material conveyor to stop, and the fire control flood to happen? Does stopping the material conveyor cause the material feed to stop? We know all of the pieces work – but does the design work? Did we build the correct emergent behavior?

Systems tend to be loosely coupled: the steering mechanism in a car doesn’t really care about the fuel injector, and the air bags don’t care about either of the others. If they communicate at all with one another, they use some well-defined and typically general purpose interconnect. Subsystems – like “the power train” – tend to be tightly coupled: the transmission cares very much about what both the engine shaft and the differential are doing, and the engine control unit cares about all of them as well as the fuel injector and other things. One implication of this is that repairing integration test failures tends to be a great deal more expensive than repairing unit test failures: unit test failures tend to be doing a right thing in a slightly wrong way, or a slightly wrong thing instead of a right thing, or occasionally not doing a thing at all, all of which are easily repaired; while integration test failures tend to be some needed mechanism got entirely overlooked, or something’s role changed between the architecture and the design. These repairs tend to be substantial, often involving entirely new capabilities in the software. Integration tests are where design failures become visible.

Test Databases and Files

When databases are involved, the database used by the tests, even the unit tests, is itself a test article, and should be treated as a first-class development object: it should be under change control, have a list of what circumstances should be present in the data, its own unit tests (typically an audit program that makes sure all the desired circumstances are present), and so forth. And no, a copy of a customer database is not sufficient: each customer uses the product in a way, and you want to test all ways; each customer is an example of what does occur, and you want to test what can occur. Nor is a large quantity of randomly generated rows sufficient for functional testing (although it is frequently helpful for performance testing): you want all the edge cases to be present in the data, and that’s not going to happen by coincidence. This first-class status is true of all test data, not just test databases: text files, CSV files, Microsoft Office documents, icons, images, sound clips, all are first-class development objects.

The traditional way to handle test databases in particular is with a SetUp or OneTimeSetUp method in the test fixture that creates a new temporary database (using a local solution such as SQLite) with appropriate known contents: typically by playing an SQL script to the database system to create the database schema, followed by bulk loading a set of CSV files into tables. Whether indices and foreign key constraints are added to the schema before or after the bulk load – there are often performance advantages to creating an index or constraint when the data is already present – is a matter of taste. Then the set up method makes whatever arrangements are necessary to point the tests at the newly created database. The tests themselves point the objects under test at the test database, and typically check the database before and after the test to make sure the circumstances being tested to are in fact present and the unit or subsystem under test did whatever was expected to the data store, in addition to behaving the right way to other software. The SQL script and the CSV files are integral components of the tests and are checked into the repository right along side the test code.

Since the test database instance is (typically) durable across an entire test fixture or collection of test fixtures, there is a possibility of test interaction through the database state: one test might depend on another having run, or not having run. While this is typically regarded as acceptable or even desirable in subsystem testing (where test scenarios typically involve lengthy sequences of operations), it is typically not acceptable in unit testing (where the individual tests are supposed to be independent of one another). The unit test constraints can be handled by recreating the database for each unit test; or by reversing the changes made to the database as part of checking the effect of the unit test; or by ensuring the unit tests use independent subsets of the data in the database. But this often also means simply repeating the unit tests without injecting mocks is insufficient as a subsystem test: when dealing with durable external state, the tests are supposed to interact with one another. To reuse unit tests as subsystem tests, you will probably need a subsystem test-specific set of database contents in addition to a unit test-specific set of database contents.

System Testing

System testing is, as one might suspect, testing the entire system as a unit after deploying to a test environment. There will probably be internal network connections requiring certificates, a durable client-server database holding “historical” data to be manipulated, data feeds from and to either simulated or test instances of external systems, clients making requests of the system, and so forth. The emphasis of system testing is overall function and scalability: does the system as a whole meet needs?

It is generally wisest to introduce the new system to a simulation of the existing production environment before introducing it to the real thing. The feeds from other systems should be watched and recorded for several months beforehand, so actual live data with all its warts and timing can be fed to the system being tested before unleashing the test system on the network. The resulting feeds to other systems should be recorded and tested by the owners of the other systems before unleashing the test system on the network. And finally, the test system is actually connected to the deployment environment, typically with the entire environment in a “test” mode where whatever happens can be isolated and recovered from. This requires cooperation from the owners of the adjacent systems, which requires a commitment of the entire organization; a major system test is much bigger than a single development team’s personnel and capital equipment can cover. There are a lot of logistics to a system test.

System tests are where requirements failures become visible. Set expectations appropriately: there will be requirements failures, and repairing them will be months-long efforts. Someone probably knew the production database backups saturated the network, leaving no bandwidth for the new system: but that someone never thought to tell the development team when they set network timeouts at an “entirely reasonable” five seconds per transaction, not per packet. Someone probably had replacing that one 10MBPS network link with a 1GBPS network link on their “to do” list pending a budget allocation: that has to get done now. Someone probably knew that mainframe over there’s ancient COBOL program actually generates XML-ish text, without quote marks on tag attribute values, instead of actual XML, but called it “XML” when interviewed: that ancient COBOL program has to get fixed now, or more likely an XML-ish adapter has to be inserted into the data stream before the XML reader. The existing message broker might not be able to handle the additional load of the messaging design of the new system; there needs to be a dedicated, or at least another, message broker. An inbound data feed might need to be added; an outbound data feed might need to be added; no one ever mentioned the management dashboard system wants to run queries on the new database and would cause transaction deadlocks. The environment is always much more complicated, more messy, and more demanding than was anticipated.

Pencil in a second system test about four months after the first. You will probably need it.

Testing Testing

Testing is important in developing software. No one can possibly do everything right the first time: it is simply beyond human capabilities. “Just don’t make mistakes” is wishful thinking, not reasonable direction. So the testing process is critical to the development process. So the testing process itself should be tested. How do you test a test process?

There are several ways to test the testing of software. The timing with which defects are discovered obeys statistical rules; you can look at the defects discovered vs. testing effort graph and get a pretty good idea of the number of defects yet to be discovered by your testing regime. You can “seed” defects – introduce known defects into the software – and see if the testing picks them up: the fraction of seeded defects that is picked up by the tests is a pretty good measure of the testing effectiveness. (Just don’t seed defects that the tests already picked up, because the tests will just pick them up again, telling you nothing. Seed defects of the same style as those you’ve already found, because those are the styles of defects the development process is producing.) You can sample the things you are testing: pick one at random and torture it into yielding as yet undiscovered defects. You will find new defects after torturing at most three components: the obscurity of the formerly undiscovered defects is a measure of the quality of your testing process. It only malfunctions when the moon is full, there are dementors flying about, and the Ministry is under pitched assault: good testing. It malfunctions when someone sneezes: not so good testing. It malfunctions when someone breathes: bad testing. (Incidentally, if every time you torture something it yields new defects, you are not ready to ship, no matter what your Jira or Bugzilla backlog says. Instead, the project needs to seriously reconsider what it think a “test” is supposed to be.)

And this is why “testing,” like UI design, is actually a specialty skill in software development. It’s nebulous, difficult, delicate, and complicated. It has specialized theory – structural, statistical, environmental, and often implementation language dependent – and a good tester is on friendly terms with that theory. But it is also something every programmer needs to know, at least in outline.