Node Pools in GKE: Mixing Machine Types in a Single Cluster

Ben Makansi
December 31, 2025

Node pools are how you mix different machine types within a single GKE cluster. This article covers what a node pool is, why you would have more than one in a cluster, the common patterns, and how the Associate Cloud Engineer exam tests this.

It does not cover every node pool flag, the regional vs zonal node pool distinction in detail, or how node pools interact with custom networking. The goal is to give you the ACE exam version, which is mostly about recognizing when multiple node pools are the right answer.

What a node pool is

A node pool is a group of nodes within a GKE cluster that share the same configuration. Same machine type, same disk size, same image, same labels. When you create a GKE cluster, it comes with a default node pool. You can add more node pools to the same cluster, each with a different configuration.

The mental model is: a cluster is the overall environment, and each node pool is a homogeneous slice of nodes within that environment. Different slices can have different shapes.

Why you would mix machine types

The whole point of node pools is to optimize for workloads that have different resource needs. A few patterns come up repeatedly.

Compute-optimized for CPU-heavy work, standard for everything else. If part of your workload is video encoding or scientific computation, you want compute-optimized machine types for those pods. The rest of your application probably runs fine on general-purpose nodes. Two node pools, one compute-optimized, one standard, lets you size each appropriately.

GPU pools for ML workloads. If you run training or inference jobs that need GPUs, you put GPU-equipped nodes in a separate node pool. The rest of your workload runs on cheaper, GPU-free nodes. Without separate pools, you would either pay GPU prices for everything (wasteful) or have no place to schedule GPU workloads (broken).

Mixed reliability tiers. As covered in the preemptible nodes article, a common pattern is one preemptible (spot) node pool for fault-tolerant work and one standard node pool for critical work. This is fundamentally a node pool decision.

Different OS images or kernel versions. If a specific workload needs a particular node image, you can put it in its own node pool with that image while the rest of the cluster runs on a different image.

Routing pods to specific node pools

Having multiple node pools does no good if Kubernetes schedules your pods randomly across all of them. You need to direct workloads to the right pool. The mechanism is node labels combined with nodeSelector in your pod spec.

Each node pool can be created with labels that get applied to every node in it. For example, a GPU pool might be labeled accelerator=nvidia. Then in your pod spec, you add a nodeSelector that matches that label:

spec:
  nodeSelector:
    accelerator: nvidia
  containers:
  - name: training-job
    image: gcr.io/my-project/trainer:v1

That pod will only schedule on nodes labeled accelerator=nvidia, which means only on the GPU node pool. Pods without that nodeSelector will not land there.

The same pattern works for every other multi-node-pool scenario. Label your preemptible nodes spot=true and your standard nodes spot=false, then use nodeSelector to send fault-tolerant work to spot=true and critical work to spot=false.

Creating a node pool

The gcloud command for adding a node pool to an existing cluster:

gcloud container node-pools create gpu-pool \
  --cluster=my-cluster \
  --machine-type=n1-standard-4 \
  --accelerator=type=nvidia-tesla-t4,count=1 \
  --node-labels=accelerator=nvidia \
  --num-nodes=2 \
  --zone=us-central1-a

The flags worth knowing for the ACE exam are --machine-type, --accelerator (for GPU pools), --node-labels, --num-nodes, and --preemptible (for spot pools). The Cloud Console UI exposes the same options.

What the ACE exam actually tests

A few patterns come up.

The first is the GPU question. The scenario describes a team that needs to run ML training or other GPU workloads on a GKE cluster. The answer is to add a GPU-enabled node pool with an appropriate label, and use nodeSelector in the pod spec to schedule the GPU pods on that pool. This question tests whether you know that GPU support is per-node-pool, not per-cluster, and that you direct pods with nodeSelector.

The second pattern is the mixed-workload cluster. The scenario describes critical and non-critical workloads in one cluster. The answer is to use multiple node pools (standard and preemptible) with labels and nodeSelectors. This is essentially the same architectural pattern.

The third pattern is recognizing what node pools cannot do. They cannot run on different versions of the GKE control plane (the control plane is per-cluster). They cannot have different VPCs or networks (those are per-cluster). The exam occasionally tests this in subtle ways by including answer choices that suggest node pools can solve a problem they actually cannot.

If you see "GPU workloads" or "different machine types in one cluster" in a question, think node pools. If you see "nodeSelector" in a Kubernetes manifest, think node pools labeled appropriately.

The bottom line

A node pool is a homogeneous group of nodes within a GKE cluster. Multiple node pools let you mix machine types in one cluster for cost optimization, GPU workloads, or mixed reliability tiers. You route pods to specific pools using node labels and nodeSelector.

The Associate Cloud Engineer exam tests this most often as a GPU scheduling question or a mixed-workload cost question. If the scenario calls out heterogeneous workloads in one cluster, multiple node pools with labels are usually the answer.

My Associate Cloud Engineer course covers node pools in the GKE cluster management section, alongside the cluster autoscaler and the patterns for combining preemptible and standard pools.

arrow