Skip navigation.

Performance bug patterns and bug-hunting

heuristics | performance testing | performance testing patterns
One way to test performance is to write scenarios and load system with them and then spend hours in examining why your performance is not as good as you want or even why it crashes at certain load.
Another way is to directly search for those issues that could be reason for load tests once they will be done. I’m going to list patterns for those issues and hints for finding those issues without blindly loading system.


First of all some background/context.
I don’t call myself performance tester although practiced this for some years and now typically manage performance tests either done by testers or developers. So I’m not going to compete with those who know java garbage collection mechanisms, web applications server architecture, etc. I’m going to approach in a somewhat black-box way while describing common performance issue patterns.
Secondly one of my hobbies is math. to be exact I’m giving lessons in combinatoric for undergraduates preparing them for international mathematic Olympiad. Guess what – I prefer using my brains (doing analyze, modeling, calculations and recognizing performance issue patterns) instead of going frenzy scripting use-cases. Running them again-and-again. Tuning both scripts and applications. Finally showing numerous graphs to management convincing them our tools are so great at collecting data.
And last but not least I have always got support from developers during performance testing and never seen them blaming tools (probably because of LoadRunner is so recognized). They have supported me with code, information about DB structure, architecture, etc. I was lucky to have built-in logging (either as debug or normal feature) features on most of applications I have tested. I only sometimes got them not believing in my calculations that I addressed by emulation afterwards...:).


Performance issue patterns
· Issue number one I know is what I call “data scalability”. You should simply populate each one of your tables (that are supposed to be dynamically extended) with at least some 10 000 rows, while the main tables (supposed to be the huge ones) with some million rows. You will need a script for this or a copy of production system. Developers typically provide me with such a scripts. But you will see a lot of issues even in functional testing – some requests will be running for half a minute instead of second or two.
· Client request returning dynamic set of data (e.g. table or tree). Although self-evident this is still unfortunately a common issue that data are retrieved from database row-by-row or one-by-one item instead single SQL statement. I’m not going into details as I believe this is really clear case and it is clear how to test it – just increase data set to be retrieved and see if time increases linearly (wrong) or by logarithm or something (right).
· Lack of indexes in database. Simply monitor DB CPU usage - if a single client call uses significant (see last item “lack of CPU resources” for CPU usage monitoring hints below) amount of CPU resources, this will under load make DB to become slow.
· Simultaneous client requests trying to update the same data: either database row/cell or file in file system, or anything. Just analyze what business data are shared among different users and what activities will update the same value. Typical case is getting next instance out common queue of tasks. Get 3 of 4 computers and try to execute this function simultaneously (press submit button at a time) and examine for issues (e.g. two users get the same item:-). Note: if you have MS SQL DB it does page lock by default, which mean if two requests update two different rows which are next to each other, they will still lock each other... probably there are more such technical issues in other tolls I’m not aware of.
· Simultaneous client requests causing server to use the temporal file in the file system. I believe this case is clear and you need to either do code review or use some specific tools monitoring file system to detect those issues.
· Simultaneous client requests causing server to execute some thread-unsafe 3rd party library call (e.g. MS word converting documents). Either developers forget to add semaphore or added, but it cause all except one thread to return error to client or cause a long waiting. Here I suggest involving developers or again using tolls that examine DLL (or something) usage or read dev. documentation – what 3rd party tools used and read those tools documentation for thread-safety support.
· No or weak support of load balancing for processing user requests. Well I don’t have a good experience with this one... Still it is wrong assumption this can only be tested under load. You should use only few computers as client and closely monitor your hardware balancing (both resource usage tools and reading debug logs). Try to review architecture – is the server (e.g. EJB) stateless or store some dynamic info per user?
· No or weak load balancing for background operations. This is tricky one. Example: two systems need to communicate each other in background. The simplest way is to pass all data in historical order. This will make logic of data processing simpler and work faster. However this will mean you can’t scale your system by adding more computers to process this in parallel. Another example process that go through the table and process each record somehow. If the process is slow you may want to run multiple instances, but then you need to add record locking mechanism that will slow down each separate process. To detect those issues you should simply ask if it is possible or try to run two or more instances.
· I’m not going to discuss batch-processing or server applications working in synchronous mode searching for next data item to process and processing it synchronously. I believe this is clear how to test it. Suggest also see last item “lack of CPU resources” for CPU usage monitoring hints below.
· Have little experience with huge traffic issues but believe it is possible to monitor traffic just the same way as I suggest below monitoring CPU, just need to know how to use this data :). However I typically observed that huge data traffic happens along with any of issues described above such as “Client request returning dynamic set of data”
· Lack of CPU resources once number of users becomes high enough. It tends to happen due to specific (localized) functions overusing CPU (not optimized). However it results in CPU being utilized up to 100% for a short period of time (e.g. second). This does not result in bad response time for single user. However once multiple users will start doing it CPU will become overused. There is a simple hint to allocate those issues within single-user-manual testing. See below
Hints on monitoring resource utilization in manual testing:
Monitor CPU time delta instead of average utilization. In windows task manager there are possible to add column “CPU time” in addition to CPU (usage). This will indicate for each application how much CPU resources it has utilized since it was started. Now if you see value 23 (seconds) and after submitting single user action see it increased to 25 it is an issue. This is not acceptable for server application. You could do simple math yourself and see that it will start lagging once 150 users will submit this request each 5 seconds on average.
You could also do the same request several times in a row and get average utilization. Suggest you to take slow hardware to better see those issues.
One more hint. Most of the client-server apps I’ve seen don’t perform asynchronous operations while processing single client request (while waiting for DB request to be completed it does not perform any business logic or file operations in parallel). It means that if you get your actual time of the response and subtract sum off all CPU utilizations you will get the time spend for network and file operations. This one is still tricky as Oracle for example is able to utilize several CPUs for single SQL execution.

Exceptions and extension to those patterns
First of all resource leaking issues. You could monitor memory usage while doing functional testing and I encourage you to do so. Monitor not only memory, but also critical resources, such as non-handled pool for windows. Still I would never myself be comfortable to say that there are no stability issues without having automated tests run for at least several hours and better at least 48.
The second is that I typically try to encourage developers to do some performance tests themselves for client-server applications. Reusing client code they could quite simply write trivial application that performs few typical client steps in a loop. It should not be hard to add threading to this code. You will not get any reusable results for capacity planning, but you will get great benchmarks for regression testing and a nice/simple stability tests. You will also get your developers to think about performance at least a little bit.

P.S. I will probably extend this list as I remember more stories, but I believe this is the most significant items listed.

Investigation

I'd say that investigation is seldom mentioned. First, some performance investigation still done by developers and functional testers. When they get several minutes response time for a web page they usually investigate that. Second, I guess every good performance tester do some investigation, but usually don't even mentioned that. Investigation is something that difficult to put into the testing plan, as well as problem fighting. Usually I, for example, ended up with plan like: scripting - 1 week, running performance tests - 1 week. While in reality it might be scripting - 1/2 day, running first test - 1/2 hour, then investigation and fighting problems for the rest of these two weeks (and often much longer).

Probably performance design/development and early performance investigation is the weakest link today. While most corporation started to use "classical" performance testing (validation), it is already too late to change design at this point. Plus the system falls into performance tester hands completely unknown, so he should spent a lot of time to investigate it (while has very limited time for tests - so quite often it is very limited and formal).

Of course, borders between performance design, testing, and management are very blurred, many things can be placed in each category. Still LoadRunner is mainly performance testing tools. It helps with tuning and capacity planning during performance tests, but not directly in production (what performance management is about in my understanding).

Regards,

Alex

I had an impression investigation is seldom used

Alexander, thank you for comments! I agree with you.
Just like in functional testing there are place for unit, integration, system and acceptance testing, it should be understood that in performance there are place for those different activities you have listed in your site. The reason why I wrote this is my impression from QAForums and some publications that validation is typically used as the only performance related activity on the project. With this post I tried to advocate that investigation is faster and more effective at allocation software bugs related to performance.
I believe using tools to emulate load is more effective at performance tuning and capacity planning, but I found on your home-site Load Runner in chapter testing, not in chapter where tuning and planning is addressed...
Regarding terminology - I've blogged about QA term misuse in general and performance testing is probably even worse. I'm impressed with your effort to solve this issue. I typically start any performance test effort with communicating terms and even theory
Regarding terminology - I've blogged about QA term misuse in general and performance testing is probably even worse. I'm impressed with your effort to solve this issue. I typically start any performance test effort with communicating terms and even theory.

investigation vs. validation

Ainars,

I am getting an impression that you are trying to contrast investigation vs. validation (if follow Scott's terminology) in your posts and telling that we need to investigate, not validate - while we need both. It also looks that you are mixing what to do with how to do and what tools (in wide sense of this word, including a program you write and Task Manager) to use. A commercial load testing tool could help you a lot in the investigation phase if you have it anyway.

Alex

Why contrast?

Completely agreed with what you are writing, but why you contrast what you say with "to write scenarios and load system with them and then spend hours in examining why your performance is not as good"? (I guess that under scenarios you mean scripts for record-and-playback load testing tools, do you? If yes, I don't agree with that definition of "scenario") By my opinion, both (more exactly all) complement each other. Yes, you can look at one user response time and cpu consumption (and really should do that), you provide the fine example. But you can get out of threads somewhere at 50 users and other 100 will line up... And performance testing will show you that. And when you need, say, 1,000 users on one box, you can learn about existence of OS parameters you never heard before.

Another thing is that we probably should somehow define terminology: I wouldn't put using performance patterns into testing. On my site, for example, I separated performance-related tasks into three categories: performance design (and development), performance testing, and performance management. All parts are important, missing any one may be disastrous. Perhaps you sometimes try to compare apples and oranges: you don't often need to choose, you can eat both.

Regards,

Alex

LoadRunner questions

I want to use loadrunner for my project for testing web-based application. I need few information.Can anyone help me out:
1.What could be the Test Datas that I should be needing for testing an
application for an e-commerce website
2what are the basic criterias that I have to focus on for testing initially?
3.Is there any server setup that is needed in order to use Loadrunner for my e-commerce website

Welcome to Performance Testing...

This is exactly what inspired my ST&P Column, Better Software Article and CMCrossroads presentation about performance investigation vs. validation (I use the catchphrase of "Investigate Early, Validate Last").

If you don't have access to the column and article, let me know and I'll get you the links.

Scott

--
Scott Barber
Chief Technologist
CEO & President
PerfTestPlus, Inc.
sbarber@perftestplus.com
http://www.perftestplus.com

Comment viewing options

Select your preferred way to display the comments and click 'Save settings' to activate your changes.