Cluster Autoscaler vs Horizontal Pod Autoscaler vs Vertical Pod Autoscaler

Ben Makansi

January 2, 2026

GKE has three different autoscalers, and they scale three different things. The Associate Cloud Engineer exam tests this distinction directly. Get the three apart in your head and the questions become straightforward. This article covers what each one does, when each is appropriate, and the patterns the ACE exam uses to test them.

It does not cover every metric backend, the math behind HPA's scaling decisions, or how to tune the cluster autoscaler's expander. The goal is to give you the version that maps to ACE exam questions.

The three layers

Each autoscaler operates on a different layer of the GKE stack. Once you see the layers clearly, the names become obvious.

Cluster Autoscaler scales nodes. It watches the cluster as a whole. When pods cannot be scheduled because no node has room for them, the cluster autoscaler adds nodes. When nodes are sitting underutilized, it removes them. The unit it changes is the number of nodes in a node pool.

Horizontal Pod Autoscaler (HPA) scales the number of pods. It watches individual deployments. When the existing pods are using a lot of CPU or some other metric, HPA creates more pods of the same type. When usage drops, HPA removes pods. The unit it changes is the replica count for a given deployment.

Vertical Pod Autoscaler (VPA) scales the size of pods. It watches individual pods' actual resource usage. When a pod is consistently using more CPU or memory than its requested allocation, VPA changes the request to match real usage. The unit it changes is the CPU and memory requests of pods, not how many pods exist.

Three different things. Cluster Autoscaler changes node count. HPA changes pod count. VPA changes pod size.

How they work together

The three autoscalers are designed to work together, not as alternatives.

HPA decides "we need more pods." Those new pods get scheduled by Kubernetes. If there is room on existing nodes, they land there. If there is not, the pods enter Pending. The Cluster Autoscaler sees pending pods and concludes "we need more nodes," so it provisions one. The new node accepts the pending pods. The application now has more pods on more nodes.

VPA operates on a different axis. While HPA is changing how many pods you have, VPA is changing how big each pod is. They do not directly conflict, but a key caveat: VPA may restart pods to apply new resource requests, because Kubernetes does not allow you to change a running pod's resource requests in place. That restart can be disruptive for stateful workloads.

What metrics they use

This is one of the more-tested distinctions on the ACE exam.

HPA can scale based on CPU, memory, or custom metrics. Custom metrics is important: if you want HPA to scale based on requests-per-second from your application, queue depth in Cloud Pub/Sub, or anything other than infrastructure metrics, HPA is the autoscaler that supports that.

VPA scales based on actual CPU and memory usage of pods. It does not support custom metrics. If your scaling decision needs to be based on something other than CPU/memory, VPA is not the answer.

The Cluster Autoscaler does not use metrics in the same sense. It reacts to pending pods and underutilized nodes. Its decision is "is there a pod that cannot be scheduled" and "is there a node that nothing needs."

The restart caveat for VPA

One detail that comes up: VPA may restart pods to apply changes. HPA does not restart pods, it just adds or removes them. This matters for stateful or long-running workloads where a restart is more than a small cost.

If a question asks about scaling without restarting pods, HPA is the answer. If the question asks about right-sizing pods based on actual usage, VPA is the answer, but with the trade-off that pods may be restarted.

When to use which

Cluster Autoscaler is something you usually want enabled in any production GKE cluster. It is the safety net that makes sure you have enough nodes for the pods that need to run, without paying for idle node capacity. Set a min and max node count per node pool to bound the behavior.

HPA is for handling load variation in your applications. A web service that gets more traffic during the day, a queue worker that needs to scale up when the queue is deep. If the workload's required capacity changes based on load, HPA is the right autoscaler.

VPA is for getting resource requests right. If you are not sure how much CPU or memory your pods actually need, and you want Kubernetes to figure it out from observed usage, VPA can do that. The catch is the potential for restarts.

What the ACE exam actually tests

The exam tests this in a few patterns.

The first pattern is the "what do I scale" question. The scenario describes a problem (load is increasing, pods need more resources, the cluster ran out of capacity) and asks which autoscaler to use. Pattern: load increasing, more pods needed, answer is HPA. Pods need more CPU/memory, answer is VPA. Cluster ran out of node capacity, answer is Cluster Autoscaler.

The second pattern is the custom metrics question. The scenario describes wanting to scale based on a non-CPU metric (queue depth, RPS, business metric). The answer is HPA, because it is the only one of the three that supports custom metrics.

The third pattern is the "no restart" question. The scenario describes wanting to scale a workload that should not be interrupted. HPA is the answer because it adds or removes pods rather than modifying existing ones.

If you see "scale based on custom metric" in a question, think HPA. If you see "right-size pod resources," think VPA. If you see "add nodes when pods cannot be scheduled," think Cluster Autoscaler.

The bottom line

Three autoscalers, three layers. Cluster Autoscaler scales nodes. HPA scales pod count and supports custom metrics. VPA scales pod size based on CPU/memory usage and may restart pods. They are designed to work together, not as alternatives.

The Associate Cloud Engineer exam tests this almost entirely through scenario-matching. Identify what is being scaled (nodes, pod count, pod size), and you have your answer.

My Associate Cloud Engineer course covers all three autoscalers in the GKE scaling section, including how they interact with node pools, the cluster autoscaler's min/max settings, and how HPA integrates with custom metrics from Cloud Monitoring.