Skip navigation.

Test levels: why should we care?

general software testing | unit testing
Today I continue to talk about why rather than how. Different sources (academic, military, etc.) define the 3 (or 4) test levels: Unit, Integration, System (and UAT). I’ve yet to see any unambiguous definition that anyone agrees with… I wonder why military one is most accepted in industry… But I don’t really care. I care not to miss defects. I would like to look at tests levels from this prospective...

Intro story: a bug
I like RTS (real time strategy) games. My favorite one is Command & Conquer. I still remember one particular defect in their first game (1995) only detected “in production” – only fixed in a patch. You could still read about .: Build a turret. Sell the building, but cancel the sale before it disappears. You get a few minigunners every time.
You know what’s the problem here? It was not supposed to cancel building selling process. It was possible to give order to any unit to “stop”, i.e. cancel the last order (e.g. stop ground bombardment). And you could give the orders to units (e.g. tanks) and turrets. It seems to me that they simply never imagined how the “stop” order will interact with building-related-orders, because it was originally unit-related order. (more over given another cheat that allowed to sell minigunners this is actually an unlimited sources or money..., so it’s quite a bug there!)
I don’t blame Westwood studios testers for missing this bug (if they had any testers at all back in 90ties?). I want you to learn from this missing bug a lesson however. Because this type of defects I find all the time in software: when two features interacts in ways that are not desired, simply because no one ever though of how they would interact. As a tester, when I see a new feature, I always ask myself – what are the other features that may interact with this feature in any way (other that those intended by architect and developers)?
Do you know what does this story have to do with test levels? I would like to call this bug an integration defects, because to the cause of the issue is wrong integration between two features: ground building order to sell a structure and unit order to stop last order. However integration test idea is to only test intended integration. Perhaps that’s why we have next level - system tests – to find those types of issues? But that’s not what is typically intended by system tests unfortunately…

Contradicting practices applied at different test levels
How contradicting? … Black VS White box, unit isolation, which level tests against requirements, etc. There is not a single concept that practices agrees upon. Black box techniques (boundary values, etc.) may and should be applied to unit tests. There are cases when people monitor code coverage (white box test attribute) at a system tests level. On the other hand it may still be reasonable to test each single function during UAT, especially if APIs are a pert of deliverable. Agile methods have proved that Unit test may be built against requirements – while definitions like this associate system level with requirements coverage goal.

Contexts: academic, military, other…
So let me try to argue there is at least following contexts in which levels are defined: academic, military/formal and other/misc. While military one is the most typically described in most testing FAQs, let me tell you how I see the academic one (I hope you will see the difference). And by the way military definition talk abut test phases not levels.
When developer writes some code it should be tested if the code does what it is intended to (actually what a developer intended). When developers integrate two or more pieces of code it should be tested that code integrates the way it is intended to (actually how developers or architect intended). Those are unit and integration tests. That’s why we test it white box – we only test if code works as developer intended. But what if developer has missed some aspect of a unit work or architect missed some aspect of integration. Missed some special cases, exception handling, or simply forgot to compete implementation of some stub/empty method. So we may have 100% test coverage but still have a lot of bugs, just because missing some code.
I remember testing server code written in C for stability. Most of memory leaks was because close() methods has empty implementations, not freeing certain system resources. Once we are pretty sure that the code we wrote does what it is intended to we need to spend more time to make sure that there are no code missing. That’s why some sources says about system testing that at this stage most of the functionality are tested already and system level testing is mostly testing non-functional quality attributes such as performance, reliability, etc.
But wait a bit, how about UAT? So at system test level check if there are no code missing. It is only left to check if there are no features missing, right! That’s why we do UAT. That’s how academics may see this. In real life there is one more goal for UAT: to check how system works in customer environment, because it appears that environment (not only user interpretation of requirements) significantly influence system work.

Summary: who cares?
So I don’t really care if one defined integration tests the tests that check how a correctly module A use module’s B interface; how two systems interacts (e.g. copy-paste through windows clipboard, or anything else. What I care is two things:
1.We could do testing even before system is built (suing stubs, APIs, etc.) or target environment is set up (in test environment). Even if system is built we may still want to bypass GUI and test API level because it is easier to automate.
2.When testing we should care to cover: requirements (what was intended), code (want is built), but also make sure that there is not missing code (error handling, memory and other optimizations, etc.) or missing requirements (what looked nice on paper unusable in reality). We also should pay special attention for pieces of code that integrate code written by different developers.