BBST Practical Lessons: False Alarms
Submitted by John McConda on Sat, 14/03/2009 - 20:49.
After pulling into my garage yesterday I accidentally pressed the panic button on my key fob. My car then proceeded to act like a nervous hysterical person with the lights flashing and horn blaring (1). I pressed the panic button again to turn it off. Well that turned out to be the wrong decision because it was the other car that I had set off in the garage, and now both cars sounded like they were having a loud, angry conversation. By the time I finally convinced both of them to settle their differences; the whole show had gone on for about 30 seconds.
What struck me about the incident is that even though I hit the panic button, nobody panicked. I didn't see a host of neighbors come running down the street trying to find out what was wrong. Nobody called the police, nobody even knocked on the door later to find out if everything was okay. Why? Because false alarms like this happen so often, we’ve been conditioned to ignore them.
This reminded me of an automated GUI regression test suite that I used to run. It would come up with 50 or so failures on most runs, different each time, but they were usually found to be false alarms in one form or another, whether it was a script issue, a data issue, or known downtime for an integrated application. Every once in a while, there would be an actual problem found by the suite, but the testers became so numb to false alarms that we had a hard time recognizing a real problem when we saw it.
In the most recent BBST class I completed, Bug Advocacy, Cem Kaner gave us an introduction to Signal Detection Theory and how it can be applied to bias in finding or not finding bugs (2). SDT is designed to measure the way we make decisions under uncertainty. There are many applications of SDT to testing, but as it relates to my example, experiments have shown that biases can be created that make us less likely to detect a real problem when we don’t expect to find one. A popular example of this involves radar operators in England (3). Having only blips on a radar screen, the operators had to decide if these were bombers or something harmless like a bird. During World War II, the operators were biased towards deciding radar blips were bombers. After the war, with no apparent threat, they were biased towards deciding they were birds. In the same way, running tests over and over that generate only false alarms can create a bias in testers towards deciding that failed tests or odd program behavior is not a bug, but just a “bird”.
So the lesson I take from these examples is that as a tester, test lead, or manager, I need to be careful about the biases I'm creating. If I or my testers are constantly wading through false alarms, we’re going to become biased away from finding real problems. If my tests (or testers!) act like nervous hysterical people over every blip on the radar, I shouldn’t be surprised when I miss the incoming missile.
1. Yes, I borrowed that analogy from a great Jerry Seinfeld monologue.
2. For Cem’s slides on this topic, see here, page 88
3. See Lloyd and Appel’s discussion of radar operators at the bottom of the first page here .
What struck me about the incident is that even though I hit the panic button, nobody panicked. I didn't see a host of neighbors come running down the street trying to find out what was wrong. Nobody called the police, nobody even knocked on the door later to find out if everything was okay. Why? Because false alarms like this happen so often, we’ve been conditioned to ignore them.
This reminded me of an automated GUI regression test suite that I used to run. It would come up with 50 or so failures on most runs, different each time, but they were usually found to be false alarms in one form or another, whether it was a script issue, a data issue, or known downtime for an integrated application. Every once in a while, there would be an actual problem found by the suite, but the testers became so numb to false alarms that we had a hard time recognizing a real problem when we saw it.
In the most recent BBST class I completed, Bug Advocacy, Cem Kaner gave us an introduction to Signal Detection Theory and how it can be applied to bias in finding or not finding bugs (2). SDT is designed to measure the way we make decisions under uncertainty. There are many applications of SDT to testing, but as it relates to my example, experiments have shown that biases can be created that make us less likely to detect a real problem when we don’t expect to find one. A popular example of this involves radar operators in England (3). Having only blips on a radar screen, the operators had to decide if these were bombers or something harmless like a bird. During World War II, the operators were biased towards deciding radar blips were bombers. After the war, with no apparent threat, they were biased towards deciding they were birds. In the same way, running tests over and over that generate only false alarms can create a bias in testers towards deciding that failed tests or odd program behavior is not a bug, but just a “bird”.
So the lesson I take from these examples is that as a tester, test lead, or manager, I need to be careful about the biases I'm creating. If I or my testers are constantly wading through false alarms, we’re going to become biased away from finding real problems. If my tests (or testers!) act like nervous hysterical people over every blip on the radar, I shouldn’t be surprised when I miss the incoming missile.
1. Yes, I borrowed that analogy from a great Jerry Seinfeld monologue.
2. For Cem’s slides on this topic, see here, page 88
3. See Lloyd and Appel’s discussion of radar operators at the bottom of the first page here .
