Horizontal vs Vertical Scaling on GCP for the PCA Exam

Ben Makansi

May 4, 2026

Horizontal scaling, sometimes called scaling out, means adding more servers, instances, or nodes to your system to spread the workload across them. If one Compute Engine VM is struggling under load, you add a second, a third, a tenth, and put a load balancer in front. Each instance handles a slice of the traffic. The system grows wider.

Vertical scaling, sometimes called scaling up, means making the servers you already have more powerful. Same number of instances, more CPU, more memory, sometimes more GPU. The system grows taller.

When each pattern fits

Horizontal scaling is the better fit when you want to distribute traffic and keep the system reliable across multiple servers. If one node fails, the others keep serving requests. You also avoid the ceiling problem, because there is always another instance you can add. The trade-off is that the application has to tolerate running on many machines at once, which usually means stateless workloads or workloads that store state somewhere external.

Vertical scaling is the better fit when you need more raw power on a single node without changing the architecture. You keep the same instance count, you just give each instance a larger machine type. This is the simpler operational change because nothing about the topology shifts. The trade-off is that every machine type has a maximum size, and a single beefier instance is still a single point of failure.

How GCP services map to each pattern

Many serverless services in Google Cloud handle scaling for you. Some of them offer horizontal autoscaling, some offer vertical autoscaling, and some offer both. Horizontal autoscaling is the more common pattern across the platform.

On Compute Engine, a managed instance group scales horizontally by adding or removing identical VMs in response to load. Resizing the machine type of an existing VM is the vertical equivalent, and on Compute Engine that requires stopping and restarting the instance. Cloud Run and App Engine standard scale horizontally by spinning up more instances of your container or service. Google Kubernetes Engine gives you both: the Horizontal Pod Autoscaler adds more pod replicas, and the Vertical Pod Autoscaler adjusts the CPU and memory requests on existing pods. Cluster Autoscaler then scales the underlying node pool horizontally to fit the pods.

On the data side, BigQuery slot capacity scales horizontally under the hood. Spanner adds nodes to scale throughput horizontally while keeping a single logical database. Cloud SQL primarily scales vertically through machine type changes, with read replicas as a horizontal option for read traffic.

What this looks like on the Professional Cloud Architect exam

Professional Cloud Architect questions often hand you a workload description and ask which scaling approach fits. The signal to look for is whether the workload tolerates running across many instances. Stateless web tiers, API services, and batch workers point to horizontal scaling and a managed instance group, GKE, or Cloud Run. A workload that depends on a single large in-memory dataset or a database engine that does not shard cleanly points to vertical scaling on a larger machine type.

Another common framing is reliability. If the question emphasizes surviving instance failures or handling traffic spikes, horizontal scaling with autoscaling and a load balancer is usually the answer. If the question emphasizes a quick capacity bump on an existing system without re-architecting, vertical scaling is usually the answer.

My Professional Cloud Architect course covers horizontal vs vertical scaling alongside the rest of the foundational architecture material.