
Most of the security questions on the Generative AI Leader exam come down to one idea: when you feed data into an AI system, you are introducing exposure vectors that traditional security models were not designed to handle. Sensitive information can end up embedded in model weights, surfaced in outputs, or sent to an external API without adequate controls. The perimeter is harder to define, and the stakes are higher.
Google Cloud frames data security in the generative AI era as maintaining absolute control over your information assets. That phrasing is deliberately strong. The exam wants you to recognize three foundational approaches for getting there, plus the two Google Cloud tools that implement them in practice.
De-identification removes or masks personally identifiable information before data ever reaches the model. Rather than hoping the model will not reproduce sensitive details, you strip them out at the source. Names, ID numbers, health data, and financial identifiers never enter the pipeline, so the model never has access to them in the first place.
For the exam, Google Cloud points to two tools as the primary implementations of de-identification:
If a Generative AI Leader question asks which Google Cloud service handles de-identification of sensitive fields before they reach an LLM, those are the two names to recognize.
Minimizing data collection takes a complementary stance: do not collect what you do not need. Every piece of sensitive data that enters your AI pipeline is a liability. If a task can be accomplished with less data, or with aggregated rather than individual-level data, that is the safer path. The exam treats this as an architectural principle, not a tooling question.
Federated learning is the architectural option of the three. Instead of centralizing data in one place for training, the model learns from data where it lives, across distributed devices or systems, without the data ever leaving its source. The insights travel. The sensitive data stays put. On the exam, federated learning is the answer when a scenario rules out moving data across boundaries but still needs a model trained on that data.
Inside de-identification itself, the exam expects you to know two specific techniques. The example Google Cloud uses is a record with a name like John Doe, an email, and an SSN. What happens to each field is the point.
The one-line distinction worth memorizing: masking obscures, substitution replaces. Both ensure the model processes data without ever seeing the actual personal identifiers behind it.
From this section, the Generative AI Leader exam expects you to walk in knowing four things:
My Generative AI Leader course covers data security with LLMs alongside the rest of the foundational material on the exam.