Cloud DLP for the PCA Exam: Masking and Format-Preserving Encryption

GCP Study Hub
January 16, 2026

Cloud Data Loss Prevention is the service I reach for whenever a Professional Cloud Architect scenario involves sensitive data sitting inside Google Cloud. It does three jobs that often get conflated in study material. It discovers sensitive data, it classifies what kind of data it found, and it transforms that data so the wrong eyes cannot read it. The transformation step is where the exam tends to get specific, and the two transformations the Professional Cloud Architect blueprint expects you to recognize are masking and format-preserving encryption.

I want to walk through both transformations carefully because the exam often hands you a scenario and asks which one fits. The trick is that masking and format-preserving encryption look similar at first glance. Both change the visible characters of a credit card number or an email address. The difference is what each one preserves and what each one destroys.

What Cloud DLP actually does

Before getting into the two transformations, I want to anchor the surrounding context. Cloud DLP is built around automated recognition of sensitive data, risk analysis on that data, redaction, and de-identification. The two categories it most often protects are personally identifiable information and protected health information, and on the exam those acronyms appear as PII and PHI.

The service runs as a standalone API, which means I can call it from a Cloud Run service or a Cloud Function and pass in a string to inspect. It also integrates natively with Cloud Storage, BigQuery, Dataflow, and Pub/Sub, which I will come back to. The compliance angle is what brings DLP into most architecture conversations. GDPR, HIPAA, and similar regimes ask organizations to identify and protect personal data, and Cloud DLP gives me an automated way to satisfy that requirement.

Masking

Masking replaces parts of the sensitive value with a fixed character, usually an asterisk, while leaving some characters visible. The visible portion is the point. Masking is what I want when an analyst, an agent, or a downstream system needs enough of the original value to keep working but should never see the full value.

The four examples I always come back to are a name, an email address, a credit card number, and a social security number.

  • A name like John Doe becomes J*** D**. The first letter of each word is preserved.
  • An email like johndoe@example.com becomes j*****e@example.com. The first character, the last character of the local part, the at sign, and the domain remain readable.
  • A credit card number like 1234 5678 9012 3456 becomes **** **** **** 3456. Only the last four digits are visible.
  • A social security number like 987-65-4321 becomes ***-**-4321. Same idea, last four digits remain.

The credit card and social security examples match how I have always seen these values displayed in customer support tools. Showing the last four digits gives a human enough information to verify they are looking at the right record without exposing the full value. That is the sweet spot for masking. It is the right choice when troubleshooting, customer verification, or partial analytics matter.

Format-preserving encryption

Format-preserving encryption, written as FPE on most exam questions, also rewrites the value, but its goal is different. FPE produces an output that has the same length and the same character class structure as the input, while no portion of the original value remains readable. The output looks like a real value of the same type, but it is encrypted, not redacted.

The same four examples make the contrast obvious.

  • John Doe becomes Qelj Tir. Two words, capitalized first letters, similar lengths. The format is preserved. The original name is gone.
  • johndoe@example.com becomes xylepto@domain.org. Local part, at sign, and domain are all present and structurally valid. The original email is gone.
  • 1234 5678 9012 3456 becomes 8763 2190 4321 6589. Sixteen digits, four groups of four with spaces. The structure is intact. The original number is gone.
  • 987-65-4321 becomes 234-12-7890. Three digits, two digits, four digits, separated by hyphens. Same shape, different value.

Notice how nothing in the FPE output gives away any character of the original. That is the key distinction. Masking preserves a portion of the original value for human reference. FPE preserves only the format so that downstream systems that validate format will keep working. A schema that expects a sixteen digit credit card number does not break, because the encrypted value still looks like a sixteen digit credit card number.

How to choose between them on the exam

When a Professional Cloud Architect question asks me to pick between masking and FPE, I read for two signals.

The first signal is whether a human needs to read or verify any portion of the original value. Customer support reading the last four of a card. An analyst troubleshooting an email delivery failure. A downstream report that shows partially redacted names. Any of those means masking, because masking is the transformation that intentionally leaks a small, defined portion of the original.

The second signal is whether downstream systems care about format but never need to read the original. A data pipeline that validates credit card length. A test environment that needs realistic looking data without exposing real customers. A machine learning feature store where the model only cares that a value is a sixteen digit number, not what those digits are. Any of those means FPE, because FPE keeps the format intact while making the data unreadable.

If the question describes both requirements, format compatibility plus the need for partial human readability, the answer usually leans toward masking, because masking still produces a value of the same length and shape. FPE wins when the absence of any readable portion of the original is itself a security requirement.

Where Cloud DLP plugs into the rest of Google Cloud

The Professional Cloud Architect exam will sometimes test the same masking and FPE concepts but wrap them in a service integration question. Cloud DLP integrates with services across both data at rest and data in transit, and the integration points are worth memorizing because they map directly to common architectures.

For data at rest, Cloud DLP integrates with Cloud Storage and BigQuery. In Cloud Storage, it scans unstructured data. Text files, CSVs, JSON files, the kinds of objects that pile up in buckets without anyone tracking what is inside them. In BigQuery, it scans columns inside relational tables. If a column contains PII or PHI, DLP can identify it and redact it.

For data in transit, Cloud DLP integrates with Dataflow and Pub/Sub. In Dataflow, I can inspect and redact data inside both streaming and batch pipelines, which means I can transform sensitive data before it ever reaches its destination. In Pub/Sub, I can scan and redact messages as they flow through topics, which protects data inside event-driven architectures where the same message might fan out to several subscribers.

The integration story matters because it tells me where to place the DLP call. If the architecture diagram shows files landing in a bucket, the answer involves Cloud Storage scanning. If the diagram shows a streaming pipeline, the answer involves Dataflow inspection. If the diagram shows tables, the answer involves BigQuery column scanning. The transformation, masking or FPE, is the same regardless of where I call DLP from. The integration just tells me which service hosts the data.

What I want you to remember

Cloud DLP is the de-identification engine. Masking partially redacts a value while preserving a portion of the original for human reference. Format-preserving encryption rewrites the value entirely while preserving its length and character class structure for downstream compatibility. The integrations with Cloud Storage, BigQuery, Dataflow, and Pub/Sub determine where the inspection happens, but the transformation choice is its own decision.

If you want to see this material covered alongside the rest of the security material, the full Professional Cloud Architect course at gcpstudyhub.com walks through Cloud DLP, IAM, VPC Service Controls, Cloud KMS, and the rest of the security surface area you need for the exam.

arrow