Diffusion Models for the Generative AI Leader Exam

Ben Makansi

January 3, 2026

Most of what shows up on the Generative AI Leader exam concerns LLMs, which generate text. But generative AI does not stop at text. Image generation runs on a different class of model entirely, and the exam expects you to recognize the architecture by name.

That architecture is the diffusion model. In this article I want to walk through how diffusion models actually produce an image, why the process works the way it does, and what Google Cloud surface you should associate with it for the Generative AI Leader exam.

The core idea: reverse the noise

A diffusion model creates images by learning to reverse a process of gradually adding noise to an image. That single sentence captures the architecture.

During training, the model sees thousands of images that get progressively destroyed by noise. A clear photograph degrades step by step until it becomes pure static. The model learns, step by step, how that destruction happened. At inference time, it runs the process in reverse. It starts from random noise and progressively recovers a coherent image.

That framing matters because it changes how you think about image generation. The model is not drawing from scratch. It is removing structure-free noise in a structured way, guided by your prompt.

The four stages of generation

The path from prompt to image moves through four recognizable stages.

Stage one is random noise. Pure static. Completely random pixels with no meaningful structure. This is the starting point for every image the model generates.

Stage two is text-guided denoising. Your text prompt steers which noise gets removed and in which direction. The description shapes the image toward what you asked for. Without the prompt, the model has no signal to push the noise in any particular direction.

Stage three is progressive refinement. The image becomes clearer and more detailed with each denoising step, like a photograph slowly developing. Each step removes a little more noise and locks in a little more structure.

Stage four is the final image. After all denoising steps are complete, a coherent result emerges that matches the prompt. On a typical visualization, you see a blurry, noisy blob on the left and a sharp, detailed photo on the right, say a cat. Same model, same process, just a different number of denoising steps applied.

Why this matters conceptually

The key intuition is that image generation is structured noise removal, guided by language. That is what makes diffusion models so powerful, and why combining them with a text encoder is what allows you to generate virtually anything you can describe.

A text-only LLM produces a sequence of tokens. A diffusion model paired with a text encoder produces a 2D grid of pixels. Both are generative AI. The mechanics are completely different, and the Generative AI Leader exam expects you to understand that distinction.

Imagen on Google Cloud

On Google Cloud, Imagen is Google's diffusion-based image generation model. It is available through Vertex AI. For the exam, the association you need to lock in is straightforward. Diffusion models are the architecture behind AI image generation, and Imagen is the Google Cloud product that uses that architecture.

If a question asks which model family generates images on Vertex AI, the answer is Imagen. If a question asks what kind of model Imagen is, the answer is a diffusion model. If a question asks how diffusion models work, the answer is that they reverse a noise-adding process to produce images from text prompts.

What to remember for the exam

Three things to carry into the Generative AI Leader exam.

First, diffusion models are a class of generative AI that create images. They are not the same architecture as LLMs.

Second, the mechanism is reversing a noise-adding process. Random noise in, coherent image out, with the text prompt guiding the denoising direction.

Third, Imagen on Vertex AI is the Google Cloud diffusion model you should be ready to name.

My Generative AI Leader course covers diffusion models and Imagen alongside the rest of the foundational material you need for the exam.