AI Infrastructure: CPUs, GPUs, TPUs (GAIL Exam)

Ben Makansi

April 30, 2026

If you zoom all the way to the bottom of the AI stack, you land at the infrastructure layer. This is the part of the picture that turns electricity into matrix multiplications. Without it, nothing above it in the stack exists. The Google Generative AI Leader exam expects you to know what lives at this layer and why each piece matters.

Here is how I break it down when I work through the material.

What the infrastructure layer is

The infrastructure layer provides the raw computational power needed to train and run AI models. It includes the hardware: CPUs and specialized chips like TPUs and GPUs. It also includes the physical environment those chips need to operate, which is most of what makes modern AI so capital-intensive.

There are five components I want you to keep in mind. Data centers, the server racks that physically house all this compute. Cooling, because running thousands of chips at full load generates enormous heat. Networking, the high-speed connections that allow data to move between chips, servers, and data centers fast enough to keep training jobs running efficiently. And then the chips themselves: GPUs, which are currently the most widely used chip for AI workloads, and TPUs, Google's custom-designed chip built specifically for machine learning.

The scale of modern AI data centers

It is worth pausing on data centers because the scale here is genuinely hard to picture. OpenAI's Stargate center in Abilene, Texas, and Google's data center in Lenoir, North Carolina, are the kind of facilities you should have in mind when an exam question references AI infrastructure.

These are massive facilities consuming hundreds of megawatts of power, enough to rival the consumption of some entire cities. Individual campuses cost billions to build and equip. They house tens of thousands of GPUs and TPUs. And because all that compute generates extreme heat, they rely on liquid cooling systems to keep temperatures manageable.

This scale is part of why the cloud, like Google Cloud, exists in the first place. Almost no organization can justify building infrastructure like this on its own. The Generative AI Leader exam frames cloud platforms as the practical access point to this kind of compute.

CPUs: general-purpose, sequential processing

CPUs, or Central Processing Units, are the workhorses we are most familiar with. They are general-purpose processors designed for sequential processing, meaning they handle tasks one after another in a methodical way.

What makes CPUs distinctive is their architecture. They have a small number of powerful cores, typically anywhere from 2 to 64 cores in most cases, and each core is optimized for handling diverse computational tasks and complex branching logic.

This is why CPUs work well for pretraining tasks like data preprocessing and feature engineering, and for training classical machine learning models such as linear regression, decision trees, and support vector machines. Those algorithms do not require massive parallel computation, but they do benefit from the CPU's ability to handle varied computational patterns efficiently. On Google Cloud, when you are working with smaller datasets or traditional ML algorithms, CPU-based instances provide a cost-effective foundation.

GPUs: thousands of cores, parallel processing

GPUs, or Graphics Processing Units, take a fundamentally different approach to computation. They were originally designed for graphics rendering, where you need to process thousands of pixels simultaneously to create smooth, real-time visuals. That origin actually explains a lot about why they are so useful for machine learning.

The key architectural difference is in the cores. Where a CPU might have 8 or 16 powerful cores, a modern GPU contains thousands of smaller, simpler cores. These cores are not as individually powerful as CPU cores, but they are optimized for parallel processing of similar tasks simultaneously.

This matters because when you train a deep learning model, you are performing massive amounts of matrix multiplication and similar mathematical operations. These operations are highly repetitive and can be broken down into thousands of smaller, identical calculations that do not depend on each other. This is exactly where GPUs shine. When processing a batch of images through a convolutional neural network, the GPU can apply the same convolution operation to different parts of all the images in parallel, with each core handling a small piece of the computation.

That parallel processing capability makes GPUs particularly effective for training deep neural networks, computer vision tasks, and any machine learning workload that involves large-scale matrix operations. NVIDIA is currently the most well-known maker of GPU chips.

CPUs vs GPUs in one line

The comparison is worth holding clearly in your head. Both chips have cores, but the structure of those cores is what makes them suited for different work.

The CPU has a small number of large cores. The GPU has a dense grid of much smaller ones. Few cores, sequential processing versus many cores, parallel processing. For AI training, breadth wins.

TPUs: Google's custom ML chip

TPUs, or Tensor Processing Units, represent Google's approach to purpose-built machine learning hardware. Unlike CPUs and GPUs, which were adapted for ML workloads, TPUs were designed from the ground up specifically for machine learning computations.

The key insight behind TPUs is that machine learning, particularly deep learning, involves very specific types of mathematical operations that occur repeatedly. Google analyzed the computational patterns in their ML workloads and built hardware optimized specifically for those patterns. TPUs feature specialized matrix multiplication units and an optimized memory architecture designed around tensor operations, which are the fundamental data structures in most modern ML frameworks.

This specialization makes TPUs particularly effective for very large-scale deep neural network training, especially when you are working with TensorFlow, since Google optimized TPUs specifically for their own framework. You will often see TPUs used for training large language models, transformer architectures, and other computationally intensive ML models where the scale justifies the specialized hardware.

One nuance worth keeping in mind. In practice, GPUs remain sufficient or even preferred for many large-scale deep learning projects, particularly when you need flexibility across different frameworks or when the workload does not fully utilize the TPU's specialized capabilities. The Generative AI Leader exam will not corner you into picking TPUs every time Google appears in the question stem.

What to take into the exam

Know the five infrastructure components: data centers, cooling, networking, plus the chips. Know that CPUs are general-purpose with few large cores, good for preprocessing and classical ML. Know that GPUs have thousands of small cores, good for training deep neural networks. Know that TPUs are Google's custom chip, good for very large-scale DNN training, especially with TensorFlow, and that GPUs are often still the practical choice.

That mental map handles the infrastructure-layer questions on the Generative AI Leader exam without making you memorize chip-vendor trivia.

My Generative AI Leader course covers the AI infrastructure layer alongside the rest of the foundational material you need for the exam.

AI Infrastructure Layer: CPUs, GPUs, TPUs for the Generative AI Leader Exam