
Autoscaling shows up in nearly every GKE-related question on the Professional Cloud Architect exam, and it is one of the topics where wrong answers come from confusing three things that sound similar but operate at completely different layers. GKE gives you three autoscalers: Cluster Autoscaler, Horizontal Pod Autoscaler, and Vertical Pod Autoscaler. Each one scales a different resource, and on the exam you need to pick the right one based on what is actually running out.
Cluster Autoscaler adjusts the number of nodes in your cluster based on the overall resource demands of the pods you are trying to run. If pods are pending because no node has enough CPU or memory to schedule them, Cluster Autoscaler adds nodes. When nodes sit underutilized, it removes them so you stop paying for capacity you do not need.
The best practice I want you to remember is to always set a minimum and maximum node count. The minimum gives you a floor for availability, and the maximum protects you from a runaway workload draining your budget by scaling out indefinitely.
If a PCA question describes pods stuck in a Pending state because the cluster has run out of room, Cluster Autoscaler is the answer.
The Horizontal Pod Autoscaler, usually written as HPA, changes the number of pod replicas in a deployment based on observed usage. It does not restart pods. It adds or removes them, which means scaling events do not interrupt the pods that are already running.
HPA can scale on CPU utilization, memory utilization, or custom metrics that you define yourself. Custom metrics matter because most real workloads do not bottleneck on CPU first. A queue depth metric or a request-per-second metric is often a better signal, and HPA is the only autoscaler in GKE that can use those.
One caveat: some stateful workloads are not compatible with horizontal scaling. If a pod holds a persistent connection or owns a piece of local storage, adding more replicas does not necessarily distribute the load the way HPA assumes. Watch for that detail in PCA scenarios that involve databases or session-bound services.
The Vertical Pod Autoscaler, or VPA, takes a different approach. Instead of changing how many pods you have, it changes how much CPU and memory each pod gets. If a pod is consistently using more memory than its request, VPA bumps the request up. If it is over-provisioned, VPA brings it down.
Two limitations matter for the exam. First, VPA may need to restart pods to apply the new resource values, which can temporarily affect availability. Second, VPA cannot scale based on custom metrics. It works off observed CPU and memory usage only.
VPA is the right answer when a question asks about right-sizing individual pods rather than running more of them.
The framing I use is to ask what is actually running out. If you are running out of room on the nodes, that is Cluster Autoscaler. If a single pod cannot keep up with traffic and you want more copies of it, that is HPA. If a pod is mis-sized and needs more CPU or memory per replica, that is VPA.
HPA and Cluster Autoscaler are commonly used together. HPA adds more pods to meet demand, and when those new pods cannot fit on the existing nodes, Cluster Autoscaler responds by adding nodes. VPA generally is not combined with HPA on the same workload because they can fight each other on the CPU dimension.
For Professional Cloud Architect questions, expect scenarios where you need to identify which autoscaler addresses a specific symptom. Read for the signal: pending pods point to nodes, traffic spikes point to replica count, and resource mismatches point to per-pod sizing.
My Professional Cloud Architect course covers GKE autoscaling alongside the rest of the containers and serverless material.