Horizontal vs Vertical Scaling: What the ACE Exam Expects You to Know

Ben Makansi
January 22, 2026

When an application starts struggling under load, there are two ways to give it more capacity: add more instances, or make the existing instances more powerful. On Google Cloud, these are horizontal scaling and vertical scaling, and the distinction shows up throughout the Associate Cloud Engineer exam because different GCP services handle each one differently. Knowing which model applies to which service, and why, makes a real difference on scenario-based questions.

Horizontal Scaling: Adding More Instances

Horizontal scaling distributes your workload across additional compute nodes rather than making individual nodes larger. When traffic increases, you add instances. When it drops, you remove them. The workload spreads across the pool.

On Google Cloud, horizontal scaling is the default model for managed services. Cloud Run adds container instances automatically as incoming requests increase, and removes them when traffic drops, including all the way to zero when nothing is coming in. Managed Instance Groups in Compute Engine support horizontal scaling through autoscaling policies based on CPU utilization, load balancer capacity, or custom metrics. When demand rises, new VM instances are added to the group. When demand falls, they are removed. In GKE, the Horizontal Pod Autoscaler adds pods based on resource usage, and the Cluster Autoscaler adds nodes to the cluster when existing nodes are too full to schedule new pods.

The advantage of horizontal scaling is that it has no practical ceiling, given sufficient GCP quota. You can keep adding instances as demand grows. The constraint is architectural: your application needs to be stateless, or at least designed to handle state externally. If each instance needs to retain information between requests, that state needs to live in a database or cache rather than on the instance itself. Stateless workloads like APIs and microservices are well suited to horizontal scaling. Stateful workloads require more careful design.

Vertical Scaling: Making Instances Larger

Vertical scaling means upgrading the compute resources on an existing instance. More vCPUs, more memory, a larger disk. You make the machine bigger rather than adding more machines.

In Compute Engine, vertical scaling means stopping a VM, changing its machine type to a larger one, and restarting it. Moving from an e2-standard-2 to an e2-standard-8 quadruples the vCPU count. This requires downtime, which is a real constraint for production workloads. Compute Engine also supports custom machine types, so if no predefined option fits your requirements you can specify exact vCPU and memory configurations.

Cloud SQL vertical scaling works similarly. If your database instance is running out of memory or CPU, you modify its machine type. The instance restarts during the resize. This is a common pattern for relational database workloads that cannot easily be distributed across multiple nodes the way stateless web traffic can.

GKE has a third option worth knowing: the Vertical Pod Autoscaler (VPA). The VPA adjusts the CPU and memory requests for individual pods based on their actual usage. It may restart pods to apply the new resource configuration, which is a meaningful difference from the Horizontal Pod Autoscaler, which adds or removes pods without restarting existing ones. The VPA is useful for workloads where sizing pods correctly matters more than adding more of them.

Which Model GCP Favors

Most of GCP's managed and serverless services are designed around horizontal scaling. Cloud Run, App Engine, Cloud Functions, and GKE autoscaling all add instances or pods rather than upgrading individual ones. This fits naturally with how cloud infrastructure works: you pay for what you use, and distributing load across many smaller instances is often more cost-efficient and more resilient than maintaining a single large one.

Vertical scaling appears most often with infrastructure that cannot be easily distributed: relational databases, or single-VM workloads with specific memory or CPU requirements. It is also the fallback when an application does not support horizontal scaling but still needs more capacity.

Scaling and High Availability

One reason GCP services favor horizontal scaling is the relationship between multiple instances and availability. When load is distributed across several instances and one fails, the others continue serving traffic. A single large instance that fails takes the entire workload down with it. Running multiple smaller instances adds a layer of resilience that vertical scaling alone does not provide.

This is why Managed Instance Groups pair autoscaling with autohealing. If a VM instance fails a health check, the MIG replaces it automatically. The group maintains the configured number of healthy instances, which means horizontal scaling and high availability work together rather than in isolation.

What the Associate Cloud Engineer Exam Tests

The ACE exam connects horizontal and vertical scaling to service selection and architecture decisions. A scenario describing traffic spikes handled by adding more servers is horizontal scaling. A scenario describing a database running out of memory that needs its instance resized is vertical scaling.

The exam also tests autoscaling configurations specifically. Know which services scale automatically without configuration (Cloud Run, Cloud Functions, App Engine), which require autoscaling to be explicitly set up (Compute Engine MIGs, GKE), and which require a manual resize (a standalone Compute Engine VM with no MIG). These distinctions separate correct answers from plausible-sounding wrong ones.

For deep coverage of GCP compute services, autoscaling configuration, and how scaling concepts appear on the ACE exam, the GCP Study Hub Associate Cloud Engineer course walks through each service in detail with exam-focused examples.

arrow