Skip navigation.

Specification by Example

'Internet radio' has come a long way since it first began. Long gone are the days where you have to seek out a station that plays the music you like, or several stations for those with more eclectic taste. Instead, services like Pandora.com and Last.fm create a personalised radio station that matches the user's own personal taste.

These personalised internet-radio stations are far more sophisticated than just specifying what genres and styles you like. In fact, you don't do that at all. With last.fm you provide one or more examples of songs that you like by uploading your music library data to their site. Software analyses the songs in your library finding common aspects of the music's 'DNA' - including genre, tempo and countless other sound characteristics in their database. From this they create a user-specific internet radio station that matches the users taste. As you listen, you give them feedback saying which songs you love and which ones you hate. Your future play-lists are refined with each piece of feedback you provide. This feedback is in itself more examples of songs that do and don't match your taste.

That is Specification by Example!

And, as several have been saying for years... Acceptance Test Driven Development is just that - Specification by Example - applied to software development.

P.S. it aint 'just' about examples

James wrote:

A Great Example of How it Ain't Just About Examples

How different your illustration is from your opening post! How much richer it is!

Indeed - there are many other skills being employed. Nobody said that the short-title of the concept was the beginning and the end - it's merely to characterise the prominent element of the approach.

As I said - my original post was intended to provide an analogy - nothing more. I appreciate the discussion that followed, however.

Antony Marcano

It is iterative...

Here is another illustration...

Four days into the iteration where the Password Strength story is being implemented, the acceptance tests pass and Ben, a talented software tester in the team, decides picks up the exploratory testing task. He wasn't involved in the original discussions but he was the first person to be available and this was the highest priority task at the time. During a testing session, he attempts to put in an easy to remember password that still has numbers and mixed-case letters but the system doesn't accept it:

Enforc3r

The reason is because the team inferred more detail from the examples than was intended. They assumed that one of the important characteristics was that the password was not a single recognisable word.

The implementation checks the password against a dictionary of words - compensating for numbers that are known substitutes for letters - such as 3 for E.

Half an hour later, Ben finishes his exploratory testing session. This was the only problem he happened to find during that 45 minutes. He looks over to his on-site customer (product owner), shows them the password and says "can I ask a question? - Do you think this password should be allowed?". The product owner replies "sure it should but let's check with the InfoSec manager". They walk over to see the InfoSec manager who appreciates Ben's point about the ease with which a password should be remembered and is satisfied that the additional combinations resulting from mixed case and numbers is enough of a deterrent to a brute-force attack.

Ben goes back to his desk and turns around for a quick chat with the programmer that last worked on it (he knows who this is because this team has daily stand-ups and a sticker of the developers photo is stuck to the task card on the iteration board). The programmer has just finished another piece of work and says "oops - ok, look I can implement it now..." Ben sits with him. The developer opens the acceptance test in his checked out copy of the project. Ben adds the example and they run the acceptance test. It fails - as we expect. The developer then changes the test for the PasswordSecurityOracle class so that it includes such an example (he does this because he prefers to be as comprehensive as possible - but could have equally decided this wasn't necessary since it was covered by the acceptance tests). He runs the test... it also fails - as expected.

The developer then deletes the unit test and class for the PasswordDictionary class and removes the calls to it in the method that in the PasswordSecurityOracle class.

The developer then re-runs his unit tests and they pass - and then runs the acceptance tests for that story and another one he thinks might be affected - they pass. He checks in the code and the build server reports a green build after running all unit tests and a second build project then running all the acceptance tests. Ben takes a new deployment into his test environment (because the team have automated this and it is even tested as part of the build process) and continues with his exploratory testing of the feature.

In this story - Ben could have left the programmer to get on with making the change... but he wanted to learn more about how they'd implemented the solution so sat with the developer. The developer was happy to narrate the changes he was making to help Ben get a better understanding of their design choices...

All this happened within the first 4-5 days of the iteration... even if it wasn't discovered until after the iteration - it could very easily be added as a new backlog item and implemented soon after. This is why the team is glad to have the benefit of short, two week iterations.

-The End-

Does that help?

Antony Marcano

Antony, your comment contain

Antony, your comment contains a great example of specification by example.

However, as James states, as a tester I am interested in how the customer came up with their ideas about what is a good password. The starting example of "tR4pp3dsqu15h" may be a technically strong password but its security is lost as soon as users start writing down passwords they can't easily remember.

So which is more secure: a technically complex password that people are likely to write down and leave on or in their desk or a less complex password that people don't write down? The answer probably depends on the context.

When wearing my developer hat, I find it easy to get caught up in the technical implementation details of whatever I'm being asked to build. I love a good technical challenge. However, I also need to put on my critical thinking tester hat and question what I'm being told and my understanding of it. This questioning should not be an inquisition, but rather a dialog similar to that in your example.

Examples can help facilitate the dialog that refine a teams' thinking about requirements. I fear that many fall into the trap (based on how I hear some talk) of thinking that the examples are good enough specifications without the dialog. They then and end up being worse off than they were using traditional dialog-free technical requirements documents to drive development.


Ben Simo
QuestioningSoftware.com

A Great Example of How it Ain't Just About Examples

How different your illustration is from your opening post! How much richer it is!

I see that you used concepts, dialectic, and examples, just as I predicted you would. You drew inferences and tested the inferences with your customer. The tests that you point to are the result of a process of test design that is a great deal more than mere listing of examples. The test artifacts themselves are but the cold dead husk of a vibrant test design process that has prospective and inspective aspects (try to say that without spitting).

Why it's called specification-by-example is a mystery to me (I would call it "testing" or perhaps "specification-by-talking", both of which focus on the process), but I suspect that the people who named it that way were more enamored with the physical outcome (the test artifacts) than they were with how that outcome gains its value. The test code, while certainly useful, would certainly not be much use without the intellectual process you just showcased. In other words, that your tests are examples is 1% of the solution; that your tests are a powerful and sufficient set of examples seems like the other 99%, to me. Just as in your original post, I need to know the secret sauce that converts examples to concepts and then back to new and better examples.

That's why I like to talk about the intellectual process-- the skills and heuristics bit. For instance, I'm interested in how you talk to a customer to bring out hidden information. Maybe the customer's concern comes from a military standard that happens to define exactly what a strong password is supposed to be. If so, we may end up with very different examples, in the end.

Conceptual vs Actual

Well, my post wasn't really intended to convince - more to provide an analogy for those that still don't understand that the process is as much about requirements elicitation as it is about anything else the name might imply. But, nonetheless... trying to explain the benefits helps me (especially as I'm writing an article on it right now)...

In the case of last.fm and pandora I believe they use meta-data about each song from the music genome project. A distance algorithm finds 'closest' matches.

In the case of a team applying this - humans analyse the examples and try to understand what the underlying meta-data is and thus the rule... i.e. the examples are how you arrive at the conceptual. User stories narrow your focus so you aren't dealing with broad usages of the system... You discover the conceptual as you exchange examples (a la prospective testing) and seek out the detail that matters. This detail finds its way into an automated test that, at the least, will hint at the conceptual or even spell it out. In fact, I've found that the process of trying to write it as an automated test ensures that we tease out the detail that matters. It is a delicate balance, however, and there is always a danger of teasing out a bunch of detail that doesn't matter - where you find yourself being overly specific.

It isn't black magic but it is a skill. It would take me the better part of a book to get this skill across - but I can coach people in it (which I often do).

Often people do start with the concept and at that point it's my role to test that concept - with examples - using that process to determine whether I (and indeed the customer) understand that concept adequately to build something from it. Other times, the customer tells you what fields and buttons they want and it's a case of steering them to what they want to achieve rather than what implementation they have in their mind.

And, yes this is a task that benefits from inter-personal skills, analytical skills and testing skills - this is why it's often done with more than one person in the conversation - possibly a business analyst, certainly a developer and a tester along with the customer (or their representative).

How about an illustration...

Let’s say we have a user story like this...

"As a user I want to know the strength of password I’m entering, so that I don’t choose a weak one that makes my account vulnerable"

In this case, the InfoSec manager is representing the customer... Now, usually the conceptual rule is well established but, for the purposes of illustrating the point with something compact like this story, let's assume that this isn't clear yet. So, I say to her – give me an example of a strong password.

She writes down:

tR4pp3dsqu15h

I ask for an example of a weak password, she writes:

Pass

I say give me an example of what is the smallest change I could make to the first example for it to no longer be considered strong. We discuss it and she writes:

trappedsquish

So, I offer an example... would this be considered strong:

tRappedsquish

She says – "no... you're right - we need a number in there" so I suggest...

tr4ppedsquish

She says actually, that would be fine if it was like this:

Tr4ppedsquish

Or like this:

tR4ppedsquish

So, what's an example of the shortest acceptable password?

She shows me:

tR4pp3ds

I then say – so, these should be rejected (and I throw in an extra one)?:

Pass

tR4pp3d

trappedsquish

Trappedsquish

tRappedsquish

tr4ppedsquish

TR4PPEDSQUISH

So, I say "it has to be mixed case and alphanumeric, right?"

(I would do the same for the examples I think should be accepted - but we'll skip that part here).

I then say if it is too weak, what should we tell the user?

She says "show an error message".

I say, "so, for example it might say" and I write:

‘please make your password stronger by having numbers and mixed case letters’

She says "yeah that’s good, but make sure they can’t change the password unless it’s strong enough. Show them it’s strong enough by highlighting it in green or something and then allow them to submit the new password."

For this illustration I'm only going to deal with the weak-password case.

To express as a test, I’ll use this format:

Given -Initial context-

When -Actions/events-

Then -Expectations-

Test: single case alphanumeric should be seen as too weak

Given the user has verified their identity

When the user changes their password to tr4ppedsquish

Then they are told “please make your password stronger by having numbers and mixed case letters”

-and the new password cannot be submitted

Then, I’ll look at that and think – if I have to write this out for every example not only will I use up a lot of time, it will be less communicative of the intent... so I’ll change it to:

Test: non mixed-case passwords without numerics should be seen as too weak

Given the user has verified their identity

When the user changes their password to a weak password

Then the password cannot be submitted

- and they are given the weak-password message: "please make your password stronger by having numbers and mixed case letters", for example:

Pass
pass
tr4pped
trappedsquish
Trappedsquish
tRappedsquish
tr4ppedsquish
TR4PPEDSQUISH

This type of test can easily be automated in Fit or Concordion. We know that to verify whether we've implemented it correctly we'll have to execute this a few times before it's completely right so - it's a test worth repeating. Once this test is passing, I might do some inspective exploratory testing to see if there was anything we missed... In this particular illustration it's highly likely but when I do, it's straight forward enough to add that new example to the list of examples.

If the customer does know the conceptual generalised rule to start with then this process is often quicker... Instead you'd simply create a set of examples that match the rule and run it by the customer... In more complex cases than this example I've found that offering examples based on their generalised rule flushes out gaps in their thinking and by the end of the conversation the rule has changed.

We've essentially arrived at a set of real-world examples of how the system is expected to be used, making it easier to validate the system (as in the Boehm sense of the word - am I building the *right* product)

I've found most real-world examples of 'generalising specifications' very hard to validate against because they often lack real-world examples of the data to be used or the examples are insufficient, derived from the general. I think Specification by Example more often works the other way round - it starts with the example and derives the conceptual rule (albeit focused by the user story you are working on)... or if it starts with the rule, it is tested by examples before we try to implement it... those examples mature into tests that allow us to validate the software against the current understanding of the need... feedback is then used to alter the software... frequent change to the software is made faster by having automation to do much of the repetitive tests, allowing testers to focus their time on discovering new tests through inspective exploratory testing.

Does that (slightly over-simplified) illustration demonstrate what I'm talking about? Is it clearer as to why people call the outcome an 'executable specification'?

But how does it work?

Antony, the way you are using this example is a great example of why specification-by-example has me worried. (Notice how I used the word example in three different contexts in that one sentence? That's part of what I'm worried about: the meaning of any object, such as a word or an example, changes with context.)

You're showing us an example that is missing the KEY part: How does the system decide what the examples mean? It sounds like a mystical process. Perhaps it's a Bayesian calculation. Perhaps it's a neural net. Maybe a human reviews them. But what variables does it consider, and why should we think that its decisions are good enough?

A traditional spec is conceptual-- tries to get directly at meaning. Examples alone don't get at meaning at all. The reason why the music thing works, if it works, is because someone behind the scenes has decided (perhaps well, perhaps poorly) what the examples mean.

I think there are three elements that work great together: concepts, examples, and dialectic. Knowing you, I bet you use all three. You have an idea (concept), you think of a few examples, then you discuss them and try them (a dialectic process).

In any case, whenever someone gives me an example, I feel I should tell them what I infer to be relevant about that example. Otherwise I could get a very wrong concept in my head.

-- James

Comment viewing options

Select your preferred way to display the comments and click 'Save settings' to activate your changes.