Google's Foundation Model Family for the Generative AI Leader Exam

GCP Study Hub
Ben Makansi
November 30, 2025

Once you have the concept of an agent in place, the next thing the Generative AI Leader exam expects you to know is what actually drives those systems. In the GCP ecosystem, those engines are called foundation models, and Google offers a suite of them rather than one model for every job. This article is a family-overview pass. Subsequent articles dig into each model on its own.

Why Google offers a suite, not a single model

Google provides a suite of foundation models that have been trained to power its own products and that are also available for you to build your own applications on. The reason for the divided suite is efficiency. Rather than forcing a single model to handle every possible task, GCP provides specialized tools for specific modalities, whether that is text, image, video, speech, code, or local deployment.

That framing matters for the Generative AI Leader exam because exam questions tend to present a specific business scenario, things like generating marketing videos or running a model on the company's own hardware, and ask you to select the correct model family. If you remember the suite as one undifferentiated pile of models, those questions get harder than they need to be. If you remember which model is built for which modality, they get easy.

The family at a glance

The foundation models you should be able to recognize on the Generative AI Leader exam are:

  • Gemini is the multi-purpose reasoning model. It is Google's flagship foundation model family and the one that handles complex logic across multiple modalities at once.
  • Imagen is the specialized model for visual tasks. If the goal is to generate or edit static images, Imagen is the model.
  • Veo is focused specifically on video creation, including text-to-video and image-to-video generation.
  • Chirp is the speech model. It sits in the speech-to-text role and is built on the universal speech model.
  • Codey is the text-to-code model. It produces functional code in different languages from natural language prompts and is optimized for developer productivity rather than general conversation.
  • Gemma is the lightweight, open-weight option. It is designed to run on your own hardware or standard VMs rather than only through managed API endpoints.

Together those six are the fundamental building blocks for any generative application you intend to build on Google Cloud. Some are first-party managed offerings, like Gemini, Imagen, Codey, and Chirp. Gemma is the open-weight one that you can take and deploy yourself.

How to map a scenario to a model

The pattern the Generative AI Leader exam tends to follow is a short business description followed by a model-family choice. A useful way to read those questions is to start from the modality and the deployment constraint.

  • If the scenario asks for reasoning across mixed inputs, that is a Gemini cue.
  • If the deliverable is a still image or visual asset, that is an Imagen cue.
  • If the deliverable is a video file or anything time-based, that is a Veo cue.
  • If the input is audio and the output is text, that is a Chirp cue.
  • If the task is producing or debugging code, that is a Codey cue.
  • If the constraint is local deployment, lightweight footprint, or running on your own infrastructure, that is a Gemma cue.

The deeper article on each model goes through the specific use case checks the exam tends to lean on. At the family-overview level, getting the modality-to-model mapping right is most of what you need.

Where these models live

Almost all of these models are accessible through Vertex AI, which is the platform layer that sits above the models layer in the AI landscape stack. Gemini also has its own consumer-facing surfaces, including the Gemini web app and the Gemini API, plus product integrations like Gemini for Workspace and Gemini for Google Cloud. Gemma is the exception to the managed-API pattern. Because it is open-weight, you take it and run it on your own hardware or on standard VMs.

What to remember for the exam

The points to lock in from this topic are:

  • Google's foundation models are a suite of specialized models, not a single general-purpose model.
  • The reason for the suite is efficiency. Each model is built for a specific modality.
  • Gemini handles reasoning and multimodal inputs. Imagen handles still images. Veo handles video. Chirp handles speech. Codey handles code. Gemma is the lightweight, open-weight option for local deployment.
  • Exam questions on this topic typically describe a scenario and ask you to pick the right model family. The fastest path through those questions is to map the modality and deployment constraint to the model.

My Generative AI Leader course walks through each of these models in more depth alongside the rest of the foundational material, including the specific use case checks the exam leans on for Imagen, Veo, Gemma, Chirp, and Codey.

arrow