Performance Requirements
Submitted by Alexander Podelko on Fri, 27/10/2006 - 14:57.
non-functional testing | performance testing
Performance requirements (PR) are necessary for system design and development. If there is no written performance requirements, it just means that they exists in heads of stakeholders, but nobody bothered to write them down and made sure that everybody agrees with them. Then PR will be input for performance testing (where they will be validated) as well as capacity planning and production monitoring (SLA - Service Level Agreement).
There are three classes of performance requirements: response times (how fast the system handle individual requests, what a real user would experience), throughput (how many requests the system can handle), and concurrency (how many users or threads work simultaneously). All classes are vital: good throughput with long response times often is unacceptable as well as good response times for a few users only.
Response times (in case of interactive work) or processing times (in case of batch jobs or scheduled activities) define how fast requests would be processed. Acceptable response times should be defined in each particular case. A time of 30 minutes can be excellent for a big batch job, but absolutely unacceptable for getting a web page in a customer portal. Although it is often difficult to draw the line here, this is rather a common sense decision.
A lot of research was done to define what response time should be for interactive systems, mainly from two point of view: what response time is necessary to achieve optimal user’s performance (for task like entering orders) and what response time is necessary to avoid web site abandoning (for the Internet). Most researchers agreed that for end-user response times (services, for example, are a completely different story) of interactive applications for most cases there is no point making response time faster than 1-2 sec and it is better to make some kind of indicators (like a progress bar) if it takes more than 5-10 sec.
The service / stored procedures response time objectives should be determined by its share in end-to-end performance "budget" that is defined by end-to-end on-line response time or batch performance requirements (so the worst combination of all required services, middleware and presentation layer overheads will provide the requested time). For example, if there is a web page with 10 drop-down boxes calling 10 separate services, the response time objective for each service may be 0.2 sec to get 3 sec average response time (leaving 1 sec for network, presentation, and rendering).
Response times for each individual transaction vary, so we need to use some aggregate values when specify performance requirements like averages or percentiles (for example, 90% of response times are less than this value). Maximal / timeout times should be provided also if necessary.
Throughput is the rate at which incoming requests are completed. Throughput defines load on the system and is measured in operations per a time unit. It may be the number of transactions per second or the number of adjudicated claims per hour.
Defining throughput may be pretty straightforward for a system doing the same kind of operations all the time like processing orders or printing reports (although in many cases some further metrics may be necessary as the number of items in order or the size of report). It may be more difficult for systems where different kind of loads exist: the ratio of different types of request can change with time and season.
It is also important to see how throughput differs with time. For example, throughput can be defined for typical hour, peak hour, and off-hour for each particular kind of load. In some cases, it is important to detail further what the load is hour-by-hour.
The number of users doesn’t, by itself, define throughput. Without defining what each user is doing and how intensely (i.e. throughput for one user), the number of users doesn’t make much sense as a measure of load. For example, if there are 500 users running short queries each minute, we have throughput of 30,000 queries per hour. If the same 500 users are running the same queries, but one per hour, the throughput is 500 queries per hour. So there are the same 500 users, but a 60-time difference between loads (and, respectively, hardware requirements for the system).
Concurrency, the number of users or threads working simultaneously, is important too. Even if users are connected, but not active, they still hold some resources.
The terminology is somewhat vague here. Usually three metrics used:
• Total or named users: all registered or potential users. That is rather a metric of data the system work with. Also indicate the upper potential limit of concurrency.
• Active or concurrent users: users logged in during the specific moment of time. That one is the real measure of concurrency in the sense it used here.
• Really concurrent: users actually running requests in the same time. While that metric looks appealing and used quite often, it is almost impossible to measure and rather confusing: to be exact, the actual number of request that can be executed simultaneously at any single moment of time is equal to the number of processors.
It is important to understand what users you are speaking about: difference between each of these three numbers of users for some systems may be hundred of times. Of course, it heavily depends on the nature of the system.
Finding of the number of concurrent users for a new system can be tricky. Usually information about real usage of similar systems can help to make the first estimation.
For batch jobs, it is also important to specify all schedule-related information like frequency (how often the job will be run), time window, dependency on other jobs and dependent jobs (and their respective time windows to see how change in one of the job may impact other).
All context is important. Performance isn’t something completely independent. It depends, for example, on hardware resources provided, volume of data it operates on, and functionality included in the system. So if any of that information is known, it should be specified in the requirements. While hardware configuration may be determined during the design stage, the volume of data to keep is usually determined by business and should be specified.
Scalability is the ability of a system to meet performance requirements as the demand increases (usually by adding hardware). Scalability requirements may include demand projections for the system such as increasing of the number of users, transaction volumes, data size, or appearing additional types of load for specific point of time in the future.
There are three classes of performance requirements: response times (how fast the system handle individual requests, what a real user would experience), throughput (how many requests the system can handle), and concurrency (how many users or threads work simultaneously). All classes are vital: good throughput with long response times often is unacceptable as well as good response times for a few users only.
Response times (in case of interactive work) or processing times (in case of batch jobs or scheduled activities) define how fast requests would be processed. Acceptable response times should be defined in each particular case. A time of 30 minutes can be excellent for a big batch job, but absolutely unacceptable for getting a web page in a customer portal. Although it is often difficult to draw the line here, this is rather a common sense decision.
A lot of research was done to define what response time should be for interactive systems, mainly from two point of view: what response time is necessary to achieve optimal user’s performance (for task like entering orders) and what response time is necessary to avoid web site abandoning (for the Internet). Most researchers agreed that for end-user response times (services, for example, are a completely different story) of interactive applications for most cases there is no point making response time faster than 1-2 sec and it is better to make some kind of indicators (like a progress bar) if it takes more than 5-10 sec.
The service / stored procedures response time objectives should be determined by its share in end-to-end performance "budget" that is defined by end-to-end on-line response time or batch performance requirements (so the worst combination of all required services, middleware and presentation layer overheads will provide the requested time). For example, if there is a web page with 10 drop-down boxes calling 10 separate services, the response time objective for each service may be 0.2 sec to get 3 sec average response time (leaving 1 sec for network, presentation, and rendering).
Response times for each individual transaction vary, so we need to use some aggregate values when specify performance requirements like averages or percentiles (for example, 90% of response times are less than this value). Maximal / timeout times should be provided also if necessary.
Throughput is the rate at which incoming requests are completed. Throughput defines load on the system and is measured in operations per a time unit. It may be the number of transactions per second or the number of adjudicated claims per hour.
Defining throughput may be pretty straightforward for a system doing the same kind of operations all the time like processing orders or printing reports (although in many cases some further metrics may be necessary as the number of items in order or the size of report). It may be more difficult for systems where different kind of loads exist: the ratio of different types of request can change with time and season.
It is also important to see how throughput differs with time. For example, throughput can be defined for typical hour, peak hour, and off-hour for each particular kind of load. In some cases, it is important to detail further what the load is hour-by-hour.
The number of users doesn’t, by itself, define throughput. Without defining what each user is doing and how intensely (i.e. throughput for one user), the number of users doesn’t make much sense as a measure of load. For example, if there are 500 users running short queries each minute, we have throughput of 30,000 queries per hour. If the same 500 users are running the same queries, but one per hour, the throughput is 500 queries per hour. So there are the same 500 users, but a 60-time difference between loads (and, respectively, hardware requirements for the system).
Concurrency, the number of users or threads working simultaneously, is important too. Even if users are connected, but not active, they still hold some resources.
The terminology is somewhat vague here. Usually three metrics used:
• Total or named users: all registered or potential users. That is rather a metric of data the system work with. Also indicate the upper potential limit of concurrency.
• Active or concurrent users: users logged in during the specific moment of time. That one is the real measure of concurrency in the sense it used here.
• Really concurrent: users actually running requests in the same time. While that metric looks appealing and used quite often, it is almost impossible to measure and rather confusing: to be exact, the actual number of request that can be executed simultaneously at any single moment of time is equal to the number of processors.
It is important to understand what users you are speaking about: difference between each of these three numbers of users for some systems may be hundred of times. Of course, it heavily depends on the nature of the system.
Finding of the number of concurrent users for a new system can be tricky. Usually information about real usage of similar systems can help to make the first estimation.
For batch jobs, it is also important to specify all schedule-related information like frequency (how often the job will be run), time window, dependency on other jobs and dependent jobs (and their respective time windows to see how change in one of the job may impact other).
All context is important. Performance isn’t something completely independent. It depends, for example, on hardware resources provided, volume of data it operates on, and functionality included in the system. So if any of that information is known, it should be specified in the requirements. While hardware configuration may be determined during the design stage, the volume of data to keep is usually determined by business and should be specified.
Scalability is the ability of a system to meet performance requirements as the demand increases (usually by adding hardware). Scalability requirements may include demand projections for the system such as increasing of the number of users, transaction volumes, data size, or appearing additional types of load for specific point of time in the future.
