AI Hardware on GCP for the PCA Exam: CPU, GPU, TPU

Ben Makansi

November 28, 2025

Note (2026-05-06): Vertex AI was rebranded as Gemini Enterprise Agent Platform. Google's exam guides still use the Vertex AI naming, so this article does too. The official guides may switch to the new name at some point as you prep, but for now we're matching the language currently in the exam materials.

When I prep architects for the Professional Cloud Architect exam, the AI hardware questions are some of the easiest to get right once you understand what each processor type is actually built for. Google has three main compute options for ML workloads on GCP, and the exam expects you to pick the right one based on the workload pattern, not based on which one sounds the most powerful.

Here is how I think about CPUs, GPUs, TPUs, accelerator configs, and the AI Hypercomputer term that has been showing up in question banks.

CPUs: general-purpose, sequential, good for the unglamorous work

CPUs, or Central Processing Units, are the general-purpose processors most workloads run on. They are designed for sequential processing, with a small number of powerful cores, typically anywhere from 2 to 64. Each core is optimized for handling diverse computational tasks and complex branching logic.

That architecture is a good fit for the kind of work that does not parallelize cleanly. Data preprocessing is a classic example. Parsing text files, handling missing values, transforming data types, applying business rules, those operations do not all look the same and they often depend on the result of the previous step. CPUs handle that well because each core can switch between operations and make decisions based on the data it sees.

For the Professional Cloud Architect exam, the rule of thumb is: CPUs are good for pretraining work like preprocessing and feature engineering, and for training classical machine learning models, things like linear regression, decision trees, and support vector machines. Those algorithms do not need massive parallel computation, so a CPU-based instance is a cost-effective foundation.

GPUs: thousands of small cores, built for parallel matrix work

GPUs, or Graphics Processing Units, take a fundamentally different approach. They were originally designed for graphics rendering, where you need to process thousands of pixels at the same time to create real-time visuals. That origin is why they are so useful for machine learning.

Where a CPU might have 8 or 16 powerful cores, a modern GPU contains thousands of smaller, simpler cores. Those cores are not as individually powerful as CPU cores, but they are optimized for parallel processing of similar tasks at the same time.

That maps directly onto deep learning. Training a neural network involves massive amounts of matrix multiplication and similar mathematical operations. Those operations are highly repetitive and can be broken into thousands of smaller, identical calculations that do not depend on each other. A GPU running a batch through a convolutional neural network can apply the same convolution operation to different parts of all the images in parallel. Each core handles a small piece, and collectively they finish the batch much faster than a CPU working through the operations sequentially.

On the exam, when you see deep neural network training, computer vision, or any workload built around large-scale matrix operations, GPUs are the default answer.

TPUs: Google's custom matrix-multiplication hardware

TPUs, or Tensor Processing Units, are Google's purpose-built ML hardware. Unlike CPUs and GPUs, which were adapted for ML workloads, TPUs were designed from the ground up for machine learning computations.

The idea behind TPUs is that machine learning, especially deep learning, involves very specific mathematical operations that occur over and over. Google built hardware optimized for those patterns. TPUs feature specialized matrix multiplication units and an optimized memory architecture designed around tensor operations, which are the fundamental data structures in modern ML frameworks.

That specialization shows up in large-scale training. Training a large transformer or a complex network with millions of parameters means moving large amounts of data between memory and processing units efficiently, and TPUs are engineered for those data flows. They are particularly effective for very large-scale deep neural network training, especially with TensorFlow, since Google optimized TPUs specifically for their own framework.

One nuance the exam expects you to know: in practice, GPUs are often sufficient or even preferred for many large-scale projects. If you need flexibility across frameworks, or the workload does not fully use the TPU's specialized capabilities, GPUs are still a reasonable choice. TPU is not automatically the right answer just because the workload is big.

Accelerators: attaching GPUs to general-purpose machine types

The next concept worth knowing is how accelerators get attached. When you need GPU acceleration but want more control over your resource allocation, you can add GPU accelerators to general purpose or other machine types within your worker pool. That gives you flexibility in balancing CPU, memory, and GPU resources for the specific workload.

A worked example: a worker pool with three workers, each running an n1-standard-4 base machine type, each with 2 GPUs attached. Same base specs, same accelerator count, applied consistently across every replica in the pool.

The advantage is customization. Rather than being limited to predefined GPU machine types with fixed CPU-to-GPU ratios, you start with a machine type that gives you the right amount of CPU and memory, then add exactly the number of GPUs you need. That matters when your training workload has requirements that do not line up with the standard GPU machine type configurations. Each worker operates independently with its own dedicated resources, so distributed training can split the workload across multiple GPU-accelerated instances while keeping consistent hardware throughout the pool.

AI Hypercomputer: what the term actually refers to

You may see references to AI Hypercomputer on the Professional Cloud Architect exam, so it is worth being clear about what the term means.

AI Hypercomputer is not an actual service or setting in GCP or Vertex AI. You will not find it in the console or in any configuration option. It refers to Google's stack of low-latency TPU pods, the infrastructure that powers large-scale AI training.

The point of the term is that TPU pods offer specialized high-bandwidth links that outperform GPU clusters in multi-node training. When you are training a large model across multiple machines, the interconnect between those machines becomes critical, and TPU pods are designed with that in mind.

If you break the term down, there are three layers. At the top is Vertex AI Training, the service you actually use. Below that are TPU Pod Slices, the compute resources doing the work. Underneath is the High-bandwidth Interconnect Fabric, the specialized networking that connects the TPU pods together. The marketing name AI Hypercomputer wraps around all three.

So when you see AI Hypercomputer on the exam, just translate it to TPU pods with high-speed interconnect for distributed training. It is more of a marketing name currently, even though the infrastructure is real.

How I expect this to show up on the exam

The AI hardware questions on the Professional Cloud Architect exam tend to follow a predictable shape. A scenario describes a workload, sometimes preprocessing, sometimes classical ML, sometimes deep learning at scale, and you pick the processor type that matches.

Default mappings I keep in my head: preprocessing and classical ML go to CPUs. Deep neural network training, computer vision, and large matrix workloads go to GPUs. Very large-scale training where TPU pods and TensorFlow are mentioned, especially multi-node training that calls out high-bandwidth interconnect, goes to TPUs. Accelerator attachment is the answer when the question emphasizes custom CPU-to-GPU ratios that the predefined GPU machine types cannot deliver. AI Hypercomputer is just a wrapper, translate it to TPU pods.

My Professional Cloud Architect course covers AI hardware selection on GCP alongside the rest of the advanced architecture material.