Performance as an Architectural Concern (Part 2)

This article is part of a series that provides practical advice and guidance on how to leverage the Continuous Architecture approach. All these articles, including the first article on performance, are available on the “Continuous Architecture in Practice” blog. “Performance Part 1” provides a definition of performance, discussing its importance, its relationship with other quality attributes and the architectural forces affecting it. Part 2 is the second article in this “Performance as an Architectural Concern” series and discusses the performance impacts of emerging trends and how to architect applications around performance modeling and testing. Please note that we are just presenting a summary of those topics here, due to space limitations. A full discussion, including a comprehensive list of resources for further reading, is included in Chapter 6 of the “Continuous Architecture in Practice” book.

Performance Impact of Emerging Trends

A number of architecture and technology trends have emerged and have become popular over the last few years. Unfortunately, while highly beneficial in some ways, they have also created new performance challenges.

Microservice Architectures

The first of these trends is the adoption of microservice architectures. Our articles on scalability (available on this blog) have a detailed discussion of microservice architectures, so we include just a brief summary here. These architectures use loosely coupled components, each with a bounded and cohesive set of responsibilities. Microservices use simple integration constructs such as RESTful APIs. Among the characteristics of microservices, size has turned out to be much less important than loose coupling and doing a few specific things well. Microservice architectures are widely used, and supporting tools and processes are widely available. They may be based on open source technologies. They support most modern languages, and offer both stateless and stateful models. Unfortunately, using a lot of very granular microservices in order to maximize modifiability, can create a performance pitfall, due to the overhead associated with inter-service communications. Addressing this issue may require consolidating some services into larger ones.

NoSQL Technology

The second trend is the growing adoption of NoSQL technology. Traditional relational databases and their performance tactics have limitations when trying to process the workloads generated by the Internet, social media, and e-commerce era. Consequently, companies such as Google, Amazon, Facebook, and Netflix started experimenting with other database architectures. This effort resulted in the emergence of NoSQL databases, which address specific performance challenges. Selecting the appropriate NoSQL database technology to address a specific performance problem requires a clear understanding of read and write patterns. Using a NoSQL database technology that is inappropriate for the performance issue to be solved will create a performance pitfall.

Another data trend is the growing adoption of big data architectures. This trend is related to the previous one, as big data architectures often use NoSQL database technologies. Big data systems share some common requirements, including write-heavy workloads, variable request loads, computation-intensive analytics and high availability. These requirements create some performance challenges, and the tactics discussed in the next article in this series can be extended to big data systems.

Public and/or Commercial Clouds

The wide adoption of public and/or commercial clouds and the emerging use of serverless architectures have an impact on how architects deal with performance in modern software systems. Those two topics are discussed in our scalability articles, so we provide just a brief summary here and cover their impact on performance.

Public and/or commercial clouds provide a number of important capabilities, such as the ability to pay as resources are being used and to rapidly scale when required, especially when containers are being used. They offer the promise of allowing a software system to handle unexpected workloads at an affordable cost without any noticeable disruption in service to the software system’s customers and, as such, are powerful mechanisms to help meet performance goals. However, a common fallacy is that performance is the problem of the cloud provider. Performance issues caused by poor application design are not likely to be addressed by running the software system in a container on a commercial cloud, especially if the architecture of the system is old and monolithic. Attempting to solve this kind of performance challenge by leveraging a commercial cloud probably will be neither successful nor cost effective.

The location of the data is another important consideration for software systems located in a commercial cloud. Latency issues may occur unless the data is collocated with the application, especially if the data needs to be accessed frequently. Integration patterns between components in the cloud and on premise not only can cause performance degradation but also can result in significant cost impacts. This is because cloud providers all have a model of charging for egress[1] costs.

Serverless Architectures

These architectures use a cloud-based model. In this model, the cloud provider manages the computing environment as well as the allocation of resources for running customer-written functions. Software engineers can ignore infrastructure concerns such as provisioning servers, maintaining their software, and operating the servers. However, lack of control over the environment may make performance tuning harder.

One of the advantages of serverless architectures is the ability to run application code from anywhere—for example, on edge servers close to end users—assuming that the cloud provider allows it. This provides an important performance advantage by decreasing latency. It is especially effective when serverless architectures use large commercial vendors with data centers located around the world. Because serverless functions are more granular than microservices, serverless architectures tend to be highly distributed, which may create a performance pitfall similar to the one we discussed for microservices. Performance can be improved by grouping some of the smaller functions into larger ones. In addition, the serverless event-driven model may create design challenges for architects who are not used to this model, and these design challenges could rcause performance issues. If not done carefully, utilizing serverless functions can create a modern spaghetti architecture where dependencies between functions get too complex and unmanageable. Ensuring that appropriate design skills are available within the team is essential to mitigating this risk.

Architecting Applications around Performance Modeling and Testing

How does Continuous Architecture help software architects and engineers ensure that a system’s performance is adequate? CA Principle 5, “Architect for build, test, deploy, and operate” directs them to use a continuous testing approach, and to make system tests, such as functionality and performance tests, an integral part of the team’s software system deployment pipeline.

In addition to the continuous architecture principles and essential activities discussed in this series of articles, architects can use various tools to ensure that the performance of a software system meets or exceeds the requirements from its stakeholders. Those tools include a number of performance tactics (described in the Part 3 article of this “Performance as an Architectural Concern” series), as well as performance modeling and performance testing. In addition, analysis of performance testing results enables architects to rapidly modify their design to eliminate performance bottlenecks if necessary.

Performance Modeling

Performance modeling should be initiated as soon as possible and be an integral part of its development. It includes the following components:

A performance model that provides an estimate of how the software system is likely to perform against different demand factors. Its purpose is to estimate the performance (specifically the latency) of the components of the software system, based on the structure of the software system components, the expected request volume, and the characteristics of the production environment. Implementation options for performance models include using a spreadsheet, an open source or commercial modeling tool, or even custom code.
A data capture and analysis process that enables the team to refine the model based on performance tests results. Performance measurements obtained from the performance tests are compared to the model’s prediction and used to adjust the model parameters.

Using a performance model and process focuses software architects and engineers on performance and enables them to make informed architecture decisions based on tradeoffs between performance and other quality attributes, such as modifiability and cost.

Performance Testing

The purpose of performance testing is to measure the performance of a software system under normal and expected maximum load conditions. For software systems deployed on a commercial cloud platform, performance testing needs to take place on the same commercial cloud platform, in an environment that is as close as possible to the production environment. An infrastructure-as-code approach to managing environments makes this relatively straightforward to achieve as well as ensure reliable environment creation and updates.

The performance testing process in a continuous delivery environment should be fully integrated with software development activities and should be as automated as possible. The goal is to shift-left performance testing activities, identify performance bottlenecks as early as possible, and take corrective action. An additional benefit is that the stakeholders are kept informed of the team’s progress, and they will be aware of problems as they emerge. There are several types of performance tests, including the following ones:

Normal load testing: Verifies the behavior of the system under the expected normal load to ensure that its performance requirements are met. Load testing enables us to measure performance metrics such as response time, responsiveness, turnaround time, throughput, and cloud infrastructure resource utilization levels. Cloud infrastructure resource utilization levels can be used to predict the operational cost of the system.
Expected maximum load testing: Similar to the normal load testing process, ensures that the system still meets its performance requirements under expected maximum load.
Stress testing: Evaluates the system behavior when processing loads beyond the expected maximum. This test also uncovers “rare” system issues that occur only under very high load and helps to establish how resilient the system is.

In practice, the team needs to run a combination of these tests on a regular basis to identify the performance characteristics of a software system. Tests in the category of normal load tests are run more frequently than tests of the other two types. Finally, it may become unfeasible to run comprehensive performance tests for software systems that have very large deployment footprints and/or that manage massive datasets, such as systems used by some social media or e-commerce companies. Examples of testing strategies used by those companies in the production environment include canary testing and a chaos engineering approach with controlled failures, implemented using automation technology.

The third and final article in this “Performance as an Architectural Concern” series will discuss the architectural tactics available to deal with performance requirements, so we hope that you find those articles useful and you will keep on reading them!

For more information on our new book, “Continuous Architecture in Practice”, which discusses performance in much more detail, please visit our website at continuousarchitecture.com.

[1] Egress is the data that is sent from a cloud to on-premise environments.