Skip navigation.

My WOPR7 diary. Feel like a peer

events | performance testing
WOPR7 theme "Agile Performance Testing" is now over and you could already find some blogs about it. Mine is a long one. More over my goals are primary to show you what it feels like to be at a peer conference and what ideas I took from it rather than describe the talks in details. So if you don’t have enough time to read it – look for other participant blogs on this site.


Special thanks to people who inspired me (even if they don’t know about it:) to attend the wonderful event: Scott Barber, Antony Marcano, Neil McCarthy and my wife.
About or disclaimer
First of all I suggest you to first read the preface . You should learn that my intension with this blog is not to describe details of the event. My goal is to share my conclusions, perception and even feelings. Also to make some further thoughts.
More over I believe that primary audience for this writing are those participated the event (while I also hope that those who have not participated will be able to learn what a wonderful experience this was):
Antony Marcano(Content Owner), Julian Harty (Logistics co-ordinator) Paul Holland (Facilitator), James Bach, James Bull, James Dobson, Dan Downing, Rachida El Amraoui, Richard Florence, Antony Gorman, Julian Harty, Paul Holland, Gordon McKeown, Raymond Rivest an me.
This blog may also be updated without any notifications based on feedback I hope to receive.
Post-diary: this is how I wrote it
I’ve used to write diaries like that. The writing will be in form of diary, although I write it only now as few days have passed. This way I could filter out all the unnecessary information and express myself more laconic. However this means that I will write from the first person, about my own, subjective vision of everything happened: 24 hours a day, all 5 days, including those that I spent traveling.
Also I will use below italic to pretend that some text have been added to the diary later on – not at the time-stamped moment.

Day 0. Meeting in a restaurant.
18:00. Just arrived in my hotel and phoned Antony to confirm participation in pre-wopr dinner. As English is my 3rd language this is must-do for me to switch my brains from Latvian to English.

Midnight. Back from dinner. Could mark one item in my checklist for WOPR “talk to James Bach in person”. Agreeably surprised He was also eager meeting me.

Past midnight. Can’t get asleep without writing little bit more. Several tables are joined into a single rectangle we are all sitting behind. James Bach sitting in the middle of the far side. Wearing the Google black hat. James Dobson was the one to ask him the question I did not dare to do – one regarding why Google. While we had the third James in a team it was decided to change names: James Bach to become Jim and James Dobson to become Jamie for the time of WOPR. I will be using those new names later on Still the Google was not the reason why I remember Jim and Jamie. Jim, sitting in the center of the table, wearing this black hat, throwing puzzles at us all and leading most of the conversations somehow reminded me of a pastor sitting behind his benefice. No I don’t mean [dev]->[depl]->[perf] to more iterative one. To implement SCRUM actually. The idea was to divide 6 weeks of scrum iteration into 4 functional weeks and last 2 performance weeks with goal to have functionally stable software are the end of 4th week. That was basically it. He also mentioned unit tests to repeat the performance issues and be some preliminary indicator that performance is improved.

During the open season I catch one more idea about the iterative approach. If I forget about scrum and only get development to provide functionally stable features time-to-time we could get [perf] people at that moment or later – depending on their availability to provide performance information/feedback for [dev] to use in later iterations. This way we could use [perf] resources when they have time and do not depend on other project schedules. This is actually how we used functional test engineers to do performance tasks when there was less work on functional tests and how we used DBA resources in our project which is far from SCRUM or any formal Agile (capital A) methodology.

I found that I missed to write anything about Julian report which was done in form of lightning talk as we were running out of time. It was about Googling using mobile phone. It was little about performance and more about implementing automated validation using Google (workstation search engine) as Oracle. While the direct comparison was unable some “result-evaluation-function returning integer” was created and if the values in oracle and test under tests differs a lot, human are involved to compare them manually.

Some general impressions of that day. The common theme today was skills. I’m not joking although it was not evident. “Individuals and interactions over processes and tools” with Jim bad example and Jamie and Julian good examples. At least it is my subjective perception There are 3 more items in agilemanifesto:
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan

I’m wondering which one will be tomorrow.


Day 1 - dinner
This time we all met at a pub with no big table for everyone to sit around. It was also a pity Jim was not with us, but I do know what are the feelings when your time zones change turns day into a night. Apparently 4 of us where sitting at a smaller table solving puzzles (provided Paul and few more by me), talking about tester’s hiring issues (as I participate tester hiring in our company it was the best topic for myself) and later playing poker.
One of the puzzles Paul gave me directly. It does not matter what was the puzzle, what matter that I eventually gave up. I know Jim would blame me for doing that. I however believe that it is significant trait of a tester to know when to give up. If you can’t solve issue – ask for help. If you can’t repeat a problem – either it will manifest itself later on when you don’t expect it or there are a little chance that it will manifest itself for any customer or at least not repeatable... well I know this is path toward not repeatable blue-screens-of-death of windows, but I was used to this “feature” :).
Unfortunately I have no idea what others were talking about.

Day 2
After a quick check-in Gordon McKeown is the first to talk this morning. The idea seems simple to me: the recorded scripts (using his record-and-playback tool) are huge and includes a lot of dynamic data (for example session ID or new quote ID – you receive them from server and have to use later in communication with server, so if your recorded script is like send_to_server(.....quoateID=”836948885“...) but the number 836948885 will change if you record exactly the same scenario again. In order to play-bck to work you need to dynamically substitute it with value from server response, like send_to_server(new_quote_to_be_created) and receive ......new_quoteID=’836948886’. Performance tool I’m using supports this by defining variables and their capturing rules, e.g capture_variable(quoteID, “new_quoteID=”, newline) will work if I know that there are always a newline after quoteID.
So this is so simple, isn’t it? Not until you’ve got 10 scripts to be re-recorded each few weeks and each script having some 1000 dynamic values to be replaced. Quite a nasty job, isn’t it? Solution – automation. Yeh, automate the captured script reworking into a working one using Python. That’s basically what Gordon’s experience was about However the technique he described was an extension to the script generator in his tool which is a more powerful technique because the generator "understands" the target language (Java in this case) and can use reflection etc. and that’s what inspires me do try this out next time I will need to do job like that. And you know – I’ve done this manual job and it is really nasty and boring one.

During open season it seems to me that not everyone realized what the experience was about. Not even after Antony Marcano tried to help Gordon explaining this. No surprise the discussion grew up into talking about alternative solutions such as UI-level scripts and later about reasons why the application was created in a way that scripts was so huge. Although there was at least two persons in the room who have worked with even larger scripts (but without need to re-record them iteratively). More over there was on a next day an experience report about UI level scripting. I felt like this was the first failure of our WOPR7 team but the only one . So at one moment I even raised the red card and tried to explain why I think it is a failure, not sure I succeeded however.

Anyway we somehow made our way toward the next experience report. Antony Gorman, developer not a tester, shared experience of a project adding additional non-functional features to existing functional application. Using NUnit and doing good, agile job targeting the specified requirements the project still was not on time. The failure (as Antony described it) was behind the performance requirements that changed.
During the open season I came to conclusion that issue is rather simple. Once the new (non-functional) features have been implemented that was a new system, with new functional user stories, which the projects somehow failed to realize, because functionally the application provided the same features.

The next was James Bull with his story about following the Agile methodology. Although the performance requirements were included in contract, they used Agile methodologies that was a learning point for me – person never seen Agile methodologies in practice. He even shows us few real cards from the project. Card with performance test related activities such as emulate spider software (one collecting info for search sites) connected to the site. The story just like one before seems to be an excellent example of how the things should be done in Agile projects. But again –there wouldn’t be a story if everything would be just fine.... One of the main issues for James was too late migrated data: performance testing with tiny database... is it a performance testing at all? From my own experience (a negative one – made the same mistake myself once) I would say functional testing on real size database is closer to performance testing than testing load for tiny data. But it’s again my subjective view.

My conclusions for this day.
A small conclusion from the first story: scripts to parse and improve other scripts are an idea to bear in mind.
Huge conclusion from the last two: We mustn’t separate functional and non-functional features, because they interact each other. Implementing new functional feature may impact non-functional features, while implementing non-functional features will change the user scenarios. For example one the site become reliable in terms of availability users will no more print or copy the information of a search results – they will come back to repeat the search once they need. It may also mean smaller result sets: for example don’t search for all trains today but only for those in current time interval and if required will return to search in different time interval

Following Day 1 practice I try to nominate topic of the day. If I skip the first technical report topic appears to me “Customer collaboration over contract negotiation“ although I’m afraid that we saw more mistakes and less triumphs.
Has I missed or what? Anyone experience about asking customer to work with system simultaneously with load being carried on it? Doing this iteratively. Asking customer what is the main concern now – lack of functionality of performance issues? Prioritizing both functional and non-functional requirements and changing the stories as both functional and non-functional characteristics changes.

Day 2 dinner
This is the last dinner as most of us are going home tomorrow. We are going to bowling tonight and later to some restaurant. I don’t welcome the idea to bowl with empty stomach, but I’ll survive.

Midnight. The diner is over and I’m trying to get asleep. Hot Indian dishes are still on my mind as well as Antony swallowing TWO green chilies afterwards. That night we’ve got familiar conversations (about family stuff and so) and saw some magical tricks by Jim and Dan. I feel really exhausted but eager to participate the last day of WOPR.
Day 3
This is day of “Minimizing effort to get working performance tests and gathering reasonably believable performance information over comprehensive test documentation and credible test results” – rephrasing “Working software over comprehensive documentation”
I feel quite tired this morning. Seems like the group feels the same way. Dan got a hard work to wake everyone up and make sure the group catches his story. He introduces is as lightweight methodology. Which seems to only means that no other documentation that the scripts themselves are created plus script creation main goal is to create something that works as fast as possible and don’t care of how nice it is in terms such as maintenance. Well, he say, at least base URL are stored as variable, so that moving from test environment to production environment only requires to change host name in a variable instead of doing replacement in all scripts.
However there are an issue around which the story is built. Although there are only 3 scenarios the amount of input data is huge (the site is image-intense and scripts are basically retrieval of different images by URLs). They were lucky to get lists of file URLs available on server, so that they could simply let the scripts go though those files. The issue however appeared because of all virtual users were supposed to use the same file and the tool limitations raised file locking issues. As a result the bottleneck in performance tests was the host where the tool were running creating the load. Following the lightweight approach they simply cloned load-creating hosts so that virtual user number on each host reduced. During the open season he especially pointed out that there could have been more elegant solutions, but the goal was to run the tests now, not to make it elegant.
The next to talk was Richard. who was mostly quite up to now, but as he started his talk it was Paul (facilitator) trying to focus him on experience, because of his enthusiastic flooding and joyful talkIn his experience due to security record and playback was almost impossible (as recorder recoded encrypted data, or even signed). So the simplest solution was to move to API level that is below security layer. No recording – he has to write all the scripts himself. The positive aspect however is that he could add all the tests into test suite in CVS and the core team will make sure that those tests are not broken during the build (and fix the scripts for him). Having a tool that could run those API level tests extended with tool’s methods enabling recording response times – and you have a compete performance tests system. He was also lucky enough to use database to store test data for scripts – given that his scripts are API level and tool integrated into the development environment (Visual Studio Team System) he could do whatever programming he wants, even communicate with DB.

I’m sorry to loose concentration during open season. What Richard told is exactly the same test approach that I practice in current project that I wanted to talk about. The only difference is different tools and different reasons to do testing at API level.

The next was Jamie with FitNesse demonstration. As far as I understand it is extension to Unit tests, which makes them (unit tests) readable by business and even editable/extendable by them. More over there are an option to define ranges not only exact values for tests and the expected results.

Next on is me. I know what I’m going to tell about – as Richard covered the techniques I’m only going to tell about our reasons (2 minutes) to work at API level, which is basically absence of higher level applications (yet to be developed). And then switch to preventing deployment issues by realizing that they will happen during deployment and client implementation. Yes this was what we did during the performance testing and it was no way waste of performance tester’s resources. Not sure if I succeeded at that, but at least show that there are “performance problems” that could be identified at API level although we don’t have API level performance requirements.
Now as I finished as also realized that I missed to mention that we are one test team doing both functional and not-functional tests and as appeared also solving deployment issues. So we have escaped the “throwing over the wall” issue simply by leading all the tasks (including working close with DBA to identify set of indices).

Whatever I did, I have to keep listening. Next one is Raymond. And the last one. His story is basically this: several different systems (so different that need to use different tools to emulate load, even use UI-level automation for one and record-playback for the second) common requirements and extremely limited time (a week). Week to: choose a tool; to plan the scope; realize load hardware requirements (for UI-based tests scripts need a lot of hosts); create scripts and run scripts.
During open season we return back to UI-scripts that was started yesterday. The idea of reducing think times to minimums and only emulate the throughput (not the real number of users) seems reasonable to me as this was how I did the UI-level tests. However quite an arguing on that.
I also missed how he is recoding response times. In my case we’ve got build-in com interface into our UI applications through which we was able to emulate user activities at UI level. The home made tool sending com-interface requests was able to record time from request sent to response received from UI application and store that into log file. Excel did the rest.

New more things happened later but I was exhausted to record them

Going home
Day 4 morning. Yesterday I fell asleep little bit past 20:00. Feel fresh this morning to catch my train at 8:30. I have a ticket bought yesterday and plan to make a 10 minutes walk to the station. I just realized that the only topic left from 4 of them in agilemanifesto is ““Responding to change over following a plan”. But what I realized – this is what all 3 days was about. what’s funny – as I was leaving hotel I met Raymond packing his bags into a taxi. I quickly responded to this change and joined him. It appeared that we have the same train to take to London.

I’m in my train to airport. 45 minutes to make final conclusions ...

I’ve always been against running pre-planned load tests. No I mean I’ve always been against supposing that this is the only thing to be done. And I know that developers or even project manager will have argues like “no user will ever do that” which, applied to load, could sound like “this is not realistic load/scenario, so we are going to ignore issues it cause”. As a tester who is practicing both functional and non-functional testing at a time this is something I’m used to. Developers I’m working with seems to learn that “bug is everything that could bugs the stakeholder” and that risk for the issue to bug the stakeholder has little to do with how reasonable the scenario was in which tester have found it. Once developers have done the investigation and both effort to fix and risk of not fixing are identified and compared – only then you could make the final decision. Meantime the mission of a tester is to search for both evidence of software working correctly in the situations defined by requirements and evidence of software working not as expected in any situation that deviates from the defined ones (which is infeasible, never-ending work, so you should feel when to stop).
As later pointed out by Gordon in a private conversation probably there was a cultural issue of two different experiences: capturing the requests that real user interaction with real application does and writing API level tests that only hypothetically has something to do with realistic load. The first is more credible in eyes of customers and BAs, the second – in eyes of architect and probably those who practice system thinking. Both approaches could lead to identifying serious performance issues and the issues to be found overlap but not completely. There are thing you will miss doing either type and ignoring the other. Could you afford both – doing a lot of overlapping tests?


Epilogue
In airport found that there are new security rules and no liquids are supposed to be in hand luggage. While I have only one case which accords to hand luggage definition I’m removing to trash bit my toothpaste and shaving stuff. Anyway it seems that this new functional requirement security feature is new not only for myself and there are quite a queue to get through the security. It appears I’ve missed to remove deodorant so lost 15 more minutes waiting in another queue for security officer to check my luggage content. It is not 5 minutes until departure. I’m running to my gate and am there exactly at the departure time. I’m not the last one however and we are waiting in some 20 minutes more until departure. Problem is solved, isn’t it? Or it is still a problem as I was unable to spend my last 50 pounds in shops (to by some gifts for my family). Cost of failure...