Preemptible Nodes in GKE: Cutting Costs with Spot Instances

Ben Makansi
December 29, 2025

Preemptible nodes (also called spot nodes) are a way to run GKE workloads at a fraction of the normal cost. The trade-off is that Google can reclaim them at any time. This article covers how they work, when they are appropriate, and the specific way the Associate Cloud Engineer exam tests them.

It does not cover every preemptible pricing detail, the difference between the older preemptible VMs and the newer Spot VMs in depth, or every nodeAffinity rule you can write. The goal is to give you the version that maps to ACE exam questions.

What preemptible nodes are

Preemptible nodes are Compute Engine VMs that Google can shut down at any time, typically when capacity is needed elsewhere. In exchange for that interruptibility, you pay much less. The discount is usually substantial, often 60 to 90 percent off the standard rate, which makes them attractive for workloads that can tolerate interruptions.

"Preemptible" and "spot" are essentially the same thing in this context. The terminology shifted over time. Older docs and the ACE exam tend to use "preemptible." Newer docs sometimes use "spot." If you see either term on the exam, they mean the same thing for the purposes of GKE node configuration.

What happens when a preemptible node is reclaimed

When Google reclaims a preemptible node, that node disappears from the cluster. Any pods that were running on it get terminated. Kubernetes then tries to reschedule those pods onto other nodes in the cluster.

If there are other nodes with spare capacity, the pods come back relatively quickly on those nodes. If there are no other nodes that can accept them, the pods enter a Pending state and wait. If your cluster also runs the cluster autoscaler, it may eventually provision a new node to accommodate the pending pods. There is no guarantee about when a reclaimed preemptible node will come back, or whether it will come back at all in the same form.

This is the central thing to understand: a preemptible node going away is a normal expected event, not a failure. Your application has to be okay with that.

When preemptible nodes are appropriate

The right workloads for preemptible nodes are fault-tolerant ones. Batch processing jobs that can be retried. Data processing jobs where occasional restarts add a bit of latency but do not break correctness. Machine learning training jobs that checkpoint regularly. Stateless web workers in environments where some capacity loss is acceptable. CI/CD runners.

The wrong workloads are critical, latency-sensitive, or stateful. Databases that need to maintain quorum. Services that hold long-lived connections to users. Anything where unexpected pod death causes a real problem rather than a small inconvenience. Those should run on standard nodes.

The pattern most teams use in production is mixed: a preemptible node pool for the fault-tolerant work, and a standard node pool for the critical work, in the same cluster.

Mixing preemptible and standard nodes

You can combine preemptible and standard node pools in a single GKE cluster. The way you control which workloads land where is through node labels and node selectors.

A common pattern is to label preemptible nodes with spot=true and standard nodes with spot=false. Then in your pod spec, you use a nodeSelector to direct pods to the right pool. Critical pods get nodeSelector: spot: "false" so they only schedule on standard nodes. Fault-tolerant pods get nodeSelector: spot: "true" so they only schedule on preemptible nodes.

spec:
  nodeSelector:
    spot: "true"
  containers:
  - name: batch-worker
    image: gcr.io/my-project/batch-worker:v1

This gives you a single cluster that can run both kinds of workloads, with the cost savings of preemptible nodes where they make sense and the reliability of standard nodes where it matters.

Creating a preemptible node pool

You create a preemptible node pool with the gcloud container node-pools create command, with the --preemptible flag:

gcloud container node-pools create spot-pool \
  --cluster=my-cluster \
  --preemptible \
  --node-labels=spot=true \
  --num-nodes=3 \
  --zone=us-central1-a

The --node-labels flag is what attaches the spot=true label to every node in this pool. Without that label, your nodeSelector logic would have nothing to match against.

What the ACE exam actually tests

A few patterns come up.

The first is the cost-tolerance pattern. The scenario describes a fault-tolerant workload, like batch processing or non-critical jobs, where the team wants to reduce costs. The answer involves preemptible nodes. If you see "fault-tolerant" or "batch processing" alongside "minimize cost" in a GKE question, preemptible nodes are usually part of the answer.

The second pattern is the mixed-cluster scenario. The question describes a cluster with both critical and non-critical workloads and asks how to optimize cost without compromising the critical parts. The answer is to use both node pool types and use labels with nodeSelector to direct pods appropriately. Critical pods to standard, fault-tolerant pods to preemptible.

The third pattern is recognizing why a pod went into Pending. If a workload was running on a preemptible node and the node was reclaimed, the pod can stay Pending until the cluster has capacity to reschedule it. The exam sometimes tests this as a "why did the workload pause" question.

If you see "fault-tolerant" plus "reduce cost" in a GKE question, think preemptible nodes. If you see a mixed-criticality cluster, think two node pools with labels and nodeSelectors.

The bottom line

Preemptible nodes are cheap nodes that can be reclaimed by Google at any time. They are great for fault-tolerant workloads and a poor fit for anything that cannot tolerate interruption. The standard pattern is to combine a preemptible node pool with a standard node pool in the same cluster, using labels and nodeSelectors to route workloads to the right pool.

The Associate Cloud Engineer exam tests this most often as a cost-optimization question for batch or fault-tolerant workloads. If the scenario calls out interruptibility tolerance and cost sensitivity, preemptible nodes are the answer.

My Associate Cloud Engineer course covers preemptible nodes in the GKE node management section, alongside node pools, the cluster autoscaler, and the labeling patterns used to route workloads between pools.

arrow