Performance as an Architectural Concern (Part 1)

This article is part of a series that provides practical advice and guidance on how to leverage the Continuous Architecture approach. We will start by discussing performance by providing a definition of it, discussing its importance, exploring its relationship with other quality attributes and discussing the architectural forces affecting it.

Performance in the Architectural Context

What exactly do we mean by performance in architecture terms? Performance can be defined as the ability of a system to achieve its timing requirements, using the available resources, under the expected full-peak load. In this definition, expected full-peak load could include high transaction volumes, a large number of users, or even additional transactions as a result of a software change. Performance and scalability are certainly related, but they are distinct. To illustrate this point, consider the checkout process in an e-commerce site. Performance is how quickly the system can process one customer’s purchases. Scalability is having multiple queues of customers and being able to add and remove checkout processes as the number of customers changes.

Most systems have performance concerns; only systems with significantly variable workloads have scalability concerns. In addition, performance is concerned mainly with throughput and latency (i.e., response time). Although scalability deals with response time, it also is concerned with other aspects such as storage or other computational resource utilization. Performance degradation, as the workload of a software system increases, may be the first indicator that a system has scalability issues.

Forces Affecting Performance

One way to look at performance is to see it as a contention-based model, where the system’s performance is determined by its limiting constraints, such as its operating environment. As long as the system’s resource utilization does not exceed its constraints, performance remains roughly linearly predictable, but when resource utilization exceeds one or more constraints, response time sharply increases. Architectural decisions influence how these forces interact. Performance is optimized when demand for resources does not cause resource utilization to exceed its constraints. When resource demand overwhelms resource supply, performance degrades dramatically. This has a cost in terms of customer experience and, potentially, market value. On the other hand, when resource supply overwhelms resource demand, the organization has overbought and needlessly increased its cost.

Resource demand is often variable and unpredictable, especially in web and e-commerce software systems where demand can suddenly increase without warning. Resource supply may be hard to increase quickly, as in the case where memory, disk space, and computing power have traditionally been fixed in the very short term and are changeable only through physical upgrades. Cloud-based architectures, especially serverless architectures, are reducing a lot of these constraints. While scalability is more concerned with managing resource supply, performance is about controlling resource demand, using a number of architecture tactics. Those tactics will be discussed in the second article in this series.

Architectural Concerns

When we discuss performance, we are concerned about timing and computational resources, and we need to define how to measure those two variables. It is critical to define clear, realistic, and measurable objectives from our business partners to evaluate the performance of a system. Two groups of measurements are usually monitored for this quality attribute.

The first group of measurements defines the performance of the system in terms of timings from the end-user viewpoint under various loads (e.g., full-peak load, half-peak load). The requirement may be stated using the end-user viewpoint; however, the measurements should be made at a finer-grained level (e.g., for each service or computational resource). The software system load is a key component of this measurement set, as most software systems have an acceptable response time under light load. Examples of the measurements included in this group are as follows (please note that others may define these terms differently, which illustrates the need to operationalize definitions using a method such as quality attribute scenarios[1]):

  • Response time/latency: Length of time it takes for a specified interaction with the system to complete. It is typically measured in fractions of seconds.
  • Turnaround time: Time taken to complete a batch of tasks. It is typically measured in seconds, minutes, or even hours.
  • Throughput: Amount of workload a system is capable of handling in a unit time period. It can be measured as a number of batch jobs or tasks per time interval or even by number of simultaneous users that can be handled by the system without any performance issue.

The second group of measurements defines the computational resources used by the software system for the load and assesses whether the planned physical configuration of the software system is sufficient to support the expected usage load plus a safety margin. This measurement can be complemented by stress testing the system (i.e., increasing the load until the performance of the system becomes unacceptable or the system stops working). In addition, performance may be constrained by the end user’s computing environment if some components execute on that environment. This would typically happen if the system includes software components (e.g., JavaScript) that execute within browsers running on end-user computing resources. Performance, scalability, usability, availability, and cost all have significant impacts on each other, as illustrated by the simple diagram below.

Relationships Between Quality Attributes

Cost effectiveness is not commonly included in the list of quality attribute requirements for a software system, yet it is almost always a factor that must be considered. In Continuous Architecture, cost effectiveness is an important consideration. Architectural decisions for performance could have a significant impact on deployment and operations costs and may require tradeoffs between performance and other attributes, such as modifiability. For example, a software architect may decide to apply CA Principle 4, Architect for change—leverage the “power of small,” in order to maximize modifiability. Unfortunately, using a lot of smaller services could have a negative impact on performance, as calls between service instances may be costly in terms of execution time because of the overhead of generating and processing interservice requests. It may be necessary to regroup some of those services into larger services to resolve performance issues.

The next article in this series will continue to discuss the architectural aspects of performance, such as the main architectural tactics available to deal with performance requirements, so we hope that you find those articles interesting and you will keep on reading them!

For more information on our new book, “Continuous Architecture in Practice”, which discusses performance in much more detail, please visit our website at

[1] See for example for more information on quality attribute scenarios

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: