
When I first walked into the Gemini portion of Google's model lineup, the natural question was "what about the smaller stuff". The Generative AI Leader exam covers that side of the family explicitly, and the answer is Gemma. Gemini lives in the cloud as a fully managed model. Gemma is the opposite design point. It is a family of lightweight, open-weight models you download and run on your own infrastructure.
Gemma is a series of lightweight and open-weight models. The two qualifiers matter. "Lightweight" means the models are small enough to run efficiently on your own hardware instead of needing a managed cloud endpoint. "Open-weight" means Google publishes the model weights, so you can deploy them locally, modify them, and fine-tune them.
Three properties define Gemma for the Generative AI Leader exam:
Compare that with Gemini, which Google manages end-to-end. With Gemini you call an API and pay per request. With Gemma you take the weights, host them yourself, and avoid the per-call cost. That tradeoff is the whole point of the model family.
The exam tests this as a decision question. Pick Gemma over Gemini when these conditions hold:
The first criterion is the easy filter. If you need a model to analyze a video and a PDF together, you need Gemini's multimodal capability. If the task is text-based and straightforward, Gemma can handle it without carrying the extra weight of a multimodal frontier model.
The second criterion is about adaptability. Gemma is built to be modular. You can fine-tune it deeply on a niche dataset or get it running on specific hardware in a way that a massive API-based model does not allow.
The third criterion is cost. Gemma is lightweight and consumes fewer resources, so it shines when efficiency is the limiting factor.
Two scenarios make the tradeoff concrete. The first is a chatbot for a customer service kiosk that needs to keep working even if the internet drops. Local deployment is non-negotiable there, and a cloud-managed model is the wrong tool. The second is a startup building a text analyzer without a large cloud budget. Per-call API costs would eat the runway, so a model you host yourself is the right answer.
Both scenarios share the same shape. They want a capable model, they need cost efficiency, and they often need offline capability. None of those are strengths of a cloud-managed Gemini deployment.
The Generative AI Leader exam tends to frame Gemma questions as scenario picks. The signal words are "local", "on-device", "open-weight", "fine-tune", "offline", and "cost-sensitive". When you see those, default to Gemma. When you see "multimodal", "video and PDF together", or "managed by Google", default to Gemini.
I cover Gemma and the rest of Google's first-party model family in my Generative AI Leader course alongside the rest of the foundational material you need for the Generative AI Leader exam.