Preventing Vertex AI Data Exfiltration with VPC Service Controls for the PCA Exam

GCP Study Hub
Ben Makansi
February 4, 2026

One of the more specific scenarios I have seen show up on the Professional Cloud Architect exam is preventing data exfiltration from a Vertex AI training workflow. The setup is always similar. A team is training models on sensitive data, customer records or financial data or proprietary business information, and that data lives in Cloud Storage. The question asks how to make sure nobody can copy the data out to an external project or pull it through an unauthorized service.

The answer is VPC Service Controls, but with a twist that trips people up. You have to know which API names to put inside the perimeter, and the obvious one is not enough on its own.

Why Vertex AI Training Is an Exfiltration Target

Vertex AI training jobs read their input data from Cloud Storage buckets. That is just how the service works. You upload your training dataset to a bucket, point the training job at the bucket, and Vertex AI pulls the data in to train the model. The model artifacts also typically land back in a bucket when training finishes.

That setup means the sensitive data has two homes worth protecting. It sits in Cloud Storage at rest, and it flows through Vertex AI during training. An attacker or a careless insider with the right permissions could exfiltrate the data in either place. They could copy the bucket contents to a bucket in an external project. They could call the Vertex AI API from an unauthorized context and pull data out through there. Identity and Access Management on its own does not stop these scenarios cleanly, because the credentials that legitimately access this data could be misused.

How VPC Service Controls Solves This

VPC Service Controls draws a security boundary around a set of Google Cloud services and the projects that contain them. Inside the boundary, services can talk to each other normally. Outside the boundary, requests are blocked even if the caller has valid IAM permissions on the resource.

For the Vertex AI exfiltration scenario, the perimeter wraps around both the project containing the training job and the project containing the Cloud Storage buckets. They can be the same project. Inside that perimeter, the Vertex AI training job reads from Cloud Storage as it always did. Outside that perimeter, two specific things stop working. An external project trying to pull data from the protected bucket is blocked. A request to the Vertex AI API from outside the perimeter is blocked. The boundary holds even when the IAM grants would otherwise allow the call.

The Two API Names You Have to Know

Here is the part the exam is actually testing. When you set up a VPC Service Controls perimeter to protect a Vertex AI training workflow, you have to include both of these in the perimeter:

  • aiplatform.googleapis.com for Vertex AI itself.
  • storage.googleapis.com for Cloud Storage.

It is tempting to think that protecting Vertex AI means putting aiplatform.googleapis.com in the perimeter and being done. That is not enough. Cloud Storage is doing the actual data hosting in the background, so an attacker who cannot reach Vertex AI from outside the perimeter could still pull the training data straight out of the bucket through storage.googleapis.com. Both APIs must be inside the same perimeter for the protection to be complete.

This is the kind of detail the Professional Cloud Architect exam is fond of. The high-level concept is easy. The specific configuration that makes the high-level concept actually work is what gets tested.

What the Exam Question Looks Like

An exam question on this scenario will describe a Vertex AI training pipeline operating on sensitive data in Cloud Storage and ask how to prevent the data from being copied to an external project or accessed by an unauthorized service. The right answer will involve VPC Service Controls. The distractors will usually involve IAM-only solutions, organization policies on bucket access, or a perimeter that only includes one of the two API names.

If the answer choice mentions VPC Service Controls but only references aiplatform.googleapis.com without storage.googleapis.com, that choice is a trap. If a choice mentions only storage.googleapis.com without Vertex AI, that is also a trap. The correct answer wraps both into the perimeter so neither path out of the boundary is left open.

The Pattern Generalizes

The same logic applies to any Google Cloud service that depends on Cloud Storage as a backing data store. BigQuery exports, Dataflow jobs that read from buckets, AutoML workflows. Wrapping just the front-facing service in a VPC Service Controls perimeter is incomplete protection if the data actually lives in Cloud Storage. The perimeter has to cover both the service the user calls and the service that holds the data, otherwise an attacker can route around the protection.

For the Professional Cloud Architect exam, the Vertex AI version of this pattern is the one I have seen most often. Remember the two API names, remember that both have to be inside the same perimeter, and you can dispatch this question quickly.

My Professional Cloud Architect course covers VPC Service Controls for Vertex AI alongside the rest of the advanced architecture material.

arrow