VPA Auto Mode on GKE for the PCA Exam

Ben Makansi

February 11, 2026

One of the cleaner GKE optimization scenarios on the Professional Cloud Architect exam is the overprovisioned pod problem. The setup is almost always the same. A workload is running with pod resource requests that are way bigger than what the pods actually consume, the cluster is paying for capacity it never uses, and the question wants you to pick the autoscaler that fixes it. The answer is the Vertical Pod Autoscaler running in Auto mode, and the most common wrong answer is the Horizontal Pod Autoscaler. I want to walk through why.

What overprovisioned pods actually look like

Picture a GKE cluster running an application workload with five pods. Each pod has requested one CPU and two gigabytes of memory. Kubernetes uses those requests to decide where to schedule the pods, and it reserves that capacity on the node. So across the five pods, the cluster has reserved five CPUs and ten gigabytes of memory.

Then look at what the pods actually use. Each pod is consuming around 0.2 CPU and 0.5 gigabytes of memory. That means the workload as a whole is using one CPU and 2.5 gigabytes of memory. Four CPUs and 7.5 gigabytes of memory are sitting reserved and idle. That is roughly 80 percent of the requested CPU and 75 percent of the requested memory going to waste.

This is the textbook overprovisioning shape. Requests are set at a level that is safe for a worst-case load that never materializes, and the cluster pays for the gap. The exam scenario is asking you to close the gap without manually editing every workload spec and without constantly babysitting usage patterns.

Why Auto mode of the Vertical Pod Autoscaler is the answer

The Vertical Pod Autoscaler watches the actual CPU and memory usage of pods over time and computes what the requests should be. In the overprovisioned scenario, it sees pods running at 0.2 CPU and 0.5 gigabytes of memory and concludes that the right request is something like 0.25 CPU and 0.5 gigabytes, sized just above the steady-state consumption.

The reason the answer is specifically Auto mode is that Auto mode applies the recommendation. It rewrites the resource requests on the pod specs and brings the cluster reservation in line with reality. Other modes of the Vertical Pod Autoscaler exist and they only emit recommendations without changing anything, which does not solve the cost problem on its own. When a Professional Cloud Architect exam question asks for an automatic, hands-off fix to overprovisioning, Auto mode is what closes the loop.

After the Vertical Pod Autoscaler acts, the same five pods now request 0.25 CPU and 0.5 gigabytes of memory. The actual usage stays where it always was, but the reservation matches it. The cluster is no longer paying for unused capacity, and there is no operator running around editing manifests.

Why the Horizontal Pod Autoscaler is the wrong answer here

The classic distractor in this scenario is the Horizontal Pod Autoscaler. It is the autoscaler people reach for first because it shows up in more GKE material, but it solves a different problem. The Horizontal Pod Autoscaler changes the number of pod replicas based on observed load. It scales the count up when load is high and down when load is low.

That does nothing for the overprovisioning shape. If each individual pod is sized at one CPU and two gigabytes of memory but only using a fraction of that, adding more replicas just multiplies the waste. Removing replicas would drop throughput. The problem is not how many pods are running. The problem is what each pod is asking for. The Horizontal Pod Autoscaler cannot rewrite per-pod requests, and so it cannot fix this scenario.

The exam usually reinforces this by stating that the number of pods needs to stay constant for throughput reasons. That phrasing is the tell. If the replica count has to remain the same and the costs are coming from oversized requests, the answer is the Vertical Pod Autoscaler. The replica count is not the variable to change.

How I read the question on the Professional Cloud Architect exam

When a Professional Cloud Architect exam question describes a GKE workload with high resource requests and low actual utilization, I look for two signals. The first is what is being wasted. If the description talks about reserved CPU or memory that the pods are not using, the fix is at the per-pod request level. That points at the Vertical Pod Autoscaler.

The second signal is whether the workload needs the same number of pods. If the scenario implies that throughput, replica count, or availability across replicas should not change, the Horizontal Pod Autoscaler is off the table. Combine those two signals and the answer is the Vertical Pod Autoscaler in Auto mode.

The reason I emphasize Auto mode rather than just the Vertical Pod Autoscaler in general is that the exam will sometimes set up scenarios where a recommendation-only configuration would not satisfy the requirement of automatic remediation. Auto mode is the configuration that actually rewrites the pod specs without human intervention, and that is what the typical overprovisioning question is asking for.

If you want a deeper walk through GKE autoscaling alongside the rest of the advanced architecture material, my full course is at https://gcpstudyhub.com/courses/professional-cloud-architect.

Vertical Pod Autoscaler in Auto Mode on GKE for the PCA Exam

What overprovisioned pods actually look like

Why Auto mode of the Vertical Pod Autoscaler is the answer

Why the Horizontal Pod Autoscaler is the wrong answer here

How I read the question on the Professional Cloud Architect exam