Scalability as an Architectural Concern (Part 2)

This article is part of a series that provides practical advice and guidance on how to leverage the Continuous Architecture approach. These articles are available on the “Continuous Architecture in Practice” website at The first article, “Scalability Part 1[1],” started discussing scalability, by providing a definition, discussing its importance, its relationship with other quality attributes and the forces affecting it. Today we will continue our discussion of scalability, covering what has changed, the types of scalability, and the impact of cloud computing on this complex system quality.

What Has Changed: The Assumption of Scalability

Scalability was not a major concern decades ago, when applications were mainly monolithic computer programs running on large Unix servers or on mainframes, and comparatively few transactions would be operating close to real time, with almost no direct customer access. Volumes and rate of change, as well as costs of building and operating those systems, could be predicted with reasonable accuracy. Business stakeholders’ expectations were lower than they are today, and the timeframe for software changes was measured in years and months rather than days, hours, and minutes. In addition, the businesses of the past were usually less scalable owing to physical constraints. Today, unexpected transaction volumes can arrive at any time, as there is no human buffer between the systems and the customers.

Software systems have evolved from monoliths to distributed software systems running on multiple servers. The emergence and rapid acceptance of the Internet enabled the applications to be globally connected. The next evolution was the appearance and almost universal adoption of cloud computing, with its promise of providing a flexible, on-demand infrastructure at a predictable, usage-based optimal cost. Software architects and engineers quickly realized that even if monolithic software systems could be ported with minimal architecture changes to distributed environments and then to cloud infrastructures, they would run inefficiently, suffer scalability problems, and be expensive to operate. Monolithic architectures had to be refactored into service-oriented architectures and then into microservice-based architectures. Each architecture evolution required a system rewrite using new technologies, including new computer languages, middleware, and databases As a result, software architects and engineers are now expected to learn new technologies almost constantly and to rapidly become proficient at using them. The next evolution of software systems is the move toward intelligent connected systems. Artificial intelligence (AI)–based technologies are fundamentally changing the capabilities of our software systems. In addition, modern applications now can directly interface with intelligent devices such as home sensors, wearables, industrial equipment, monitoring devices, and autonomous cars to name a few.

Each evolution of software systems has increased their accessibility and therefore the load that they need to be able to handle. In a span of about 40 years, they evolved from software operating in batch mode, with narrow capabilities, into ubiquitous utilities that are indispensable to our daily lives. As a result, estimating transaction volumes, data volumes, and number of users has become extremely difficult. How could anyone predict the traffic volumes caused by extraordinary events such as health incidents (e.g., COVID-19 pandemic) when suddenly became one of the few stores still selling everyday goods during country and state lockdowns?

Types of Scalability

Calling a system scalable is a common oversimplification. Scalability is a multidimensional concept that needs to be qualified, as it may refer to application scalability, data scalability, or infrastructure scalability, to name a few of many possible types.

Unless all of the components of a system are able to cope with the increased workload, the system cannot be considered scalable. Assessing the scalability of a software system involves discussing scenarios: Will the system be able to cope with an unexpected transaction volume increase of 100 percent over the estimates? Even if the system can’t scale, could it fail gracefully? (Many security exploits rely on hitting the system with unexpected load and then taking over when the application fails.) Will the platform be able to support a significant number of customers beyond the initial estimates without any major changes to the architecture? Will the team be able to rapidly add computing resources at a reasonable cost if necessary?

Software architects and engineers generally use two key mechanisms to enable a software system to respond to workload changes: vertical scalability and horizontal scalability.

Vertical scalability, or scaling up, involves handling volume increases by running an application on larger, more powerful infrastructure resources. This scalability strategy was commonly used when monolithic applications were running on large servers such as mainframes or big Unix servers such as the Sun E10K. Changes to the application software or the database may be required when workloads increase in order to utilize increased server capacity (e.g., to take advantage of an increase in server memory). Scalability is handled by the infrastructure, providing that a larger server is available, is affordable, and can be provisioned quickly enough to handle workload increases and that the application can take advantage of the infrastructure.

This may be an expensive way to handle scaling, and it has limitations. However, vertical scaling is the only solution option for some problems such as scaling in-memory data structures (e.g., graphs). It can be cost effective if the workload does not change quickly. The challenge is matching processing, memory, storage, and input/output (I/O) capacity to avoid bottlenecks. It is not necessarily a bad strategy.

Horizontal scalability, or scaling out, refers to scaling an application by distributing it on multiple nodes. This technique is often used as an alternative to a vertical scalability approach, although it is also used to reduce latency and improve fault tolerance, among other goals. Several approaches may be used or even combined to achieve this. These classical approaches are still valid options, but containers and cloud-based databases provide new alternatives with additional flexibility:

  • The simplest approach involves segregating incoming traffic by some sort of partitioning, perhaps a business transaction identifier hash; by a characteristic of the workload (e.g., the first character of the security identifier); or by user group. This is a similar approach to the one used for sharding databases. Using this option, a dedicated set of infrastructure resources, such as containers, handles a specific group of users and their data.
  • A second, more complex approach involves cloning the compute servers and replicating the databases. Incoming traffic is distributed across the servers by a load balancer. Still, there are data challenges associated with this approach. All the data updates are usually made to one of the databases (the master database) and then cascaded to all the others using a data replication mechanism. This process may cause update delays and temporary discrepancies between the databases. If the volume or frequency of database writes is high, it may also cause the master database to become a bottleneck.
  • A third, even more complex approach to horizontal scalability involves splitting the functionality of the application into services and distributing services and their associated data on separate infrastructure resource sets, such as containers. This works well for architectural designs based on a set of services organized around business domains, following the Domain-Driven Design approach, and deployed in cloud-based containers. Using this approach, data replication would be minimal if the data is associated to services and organized around business domains.

The Effect of Cloud Computing

Commercial cloud platforms provide a number of important capabilities, such as the ability to pay for resources as they are used and to rapidly scale when required. The latter is especially true when containers are being used, because infrastructure resources such as virtual machines may take significant time to start. The result may be that the system experiences issues while processing workloads for several minutes if capacity is reduced and then must be rapidly increased again. This is often a cost tradeoff, and it is one of the reasons containers have become so popular. They are relatively inexpensive in terms of runtime resources and can be started relatively quickly.

Cloud computing offers the promise of allowing an application to handle unexpected workloads at an affordable cost without any noticeable disruption in service to the application’s customers, and as such, cloud computing is a very good option to enable scaling. However, designing for the cloud means more than packaging software into virtual machines or containers.

Although vertical scalability can be leveraged to some extent, horizontal scalability (called elastic scalability in cloud environments) is the preferred approach with cloud computing. Still, there are at least two concerns that need to be addressed with this approach. First, in a pay-per-use context, unused resources should be released as workloads decrease, but not too quickly or in response to brief, temporary reductions. As we saw earlier in this section, using containers is preferable to using virtual machines in order to achieve this goal. Second, instantiating and releasing resources should be automated wherever possible to keep the cost of scalability as low as possible. Horizontal scalability approaches can carry a hidden cost in terms of the people required to operate all the resources required to handle the workload unless the operation of that infrastructure is fully automated.

Leveraging containers to deploy a suitably designed software system on cloud infrastructure has significant benefits over using virtual machines, including better performance, less memory usage, faster startup time, and lower cost. Design decisions such as packaging services as containers running in container orchestrators like Kubernetes, helps to ensure that software can be deployed on and will run well on cloud infrastructures.

In addition, structuring a software system as a set of independent runtime services, communicating only though well-defined interfaces allows software architects and engineers to leverage horizontal scalability approaches. Horizontal scalability approaches rely on using a load balancer of some sort, for example, an API gateway for inbound traffic and a service mesh for inter-service traffic. In a commercial cloud, the costs associated with the load balancer itself are driven by the number of new and active requests and the data volumes processed.

An elastic load balancer is a useful tool for managing scalability costs. This type of load balancer constantly adjusts the number of containers according to workload. Using this tool, infrastructure costs are minimized when workloads are small, and additional resources (and associated costs) are automatically added to the infrastructure when workloads increase. In addition, implementing a governance process to review the configurations of each infrastructure element periodically would ensure that each element is optimally configured for the current workload.

An additional concern is that commercial cloud environments also may have scalability limits. Software architects and engineers need to be aware of those limitations and create strategies to deal with them, for example, by being able to rapidly port their applications to a different cloud provider if necessary. The tradeoff to such an approach is that you end up leveraging fewer of the cloud-native capabilities of the cloud provider of choice.

The third and final article in this “Scalability as an Architectural Concern” series will discuss the architectural tactics available to deal with scalability requirements, so we hope that you find those articles useful and you will keep on reading them!

For more information on our new book, “Continuous Architecture in Practice”, which discusses scalability in much more detail, please visit our website at


1 thought on “Scalability as an Architectural Concern (Part 2)

  1. […] its importance, its relationship with other quality attributes and the forces affecting it. Part 2 covers what has changed, the types of scalability, and the impact of cloud computing. This article […]


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: