Private Google Access for the PDE Exam

GCP Study Hub
619c7c8da6d7b95cf26f6f70
May 6, 2026

Private Google Access is one of those networking features that quietly shows up across a lot of Professional Data Engineer scenarios. The setup is almost always the same. A security team has locked down external IPs across the project, and now a data pipeline needs to write to BigQuery or read from Cloud Storage from a worker that has no public address. If you do not know what knob to turn, the workload looks broken. If you do, the fix is a single subnet-level toggle.

I want to walk through how this feature actually works, where it intersects with the data engineering services you care about, and the specific gotchas that tend to surface on the Professional Data Engineer exam.

What Private Google Access actually does

By default, a VM in Google Cloud reaches Google APIs like BigQuery, Cloud Storage, Pub/Sub, or Dataflow's control plane the same way any other client on the internet does. It sends traffic out to a public endpoint such as bigquery.googleapis.com over its external IP. If the VM has no external IP, that traffic has nowhere to go and the call fails.

Private Google Access changes that. When you enable it on a subnet, any VM in that subnet with only an internal IP can still reach Google APIs and managed services. The traffic never traverses the public internet. It stays inside Google's network and exits through a private path to the API endpoint. From the workload's perspective, nothing about the code changes. A gsutil cp or a BigQuery client call works exactly the same. The only difference is that the packets take an internal route.

This is a per-subnet setting, not a per-VM or per-project setting. You enable it on the subnet, and every VM inside that subnet inherits the behavior. A different subnet in the same VPC can have it disabled, and VMs there will need external IPs to reach Google APIs.

Why this matters for data engineering workloads

Most of the managed data services on Google Cloud run worker VMs inside your VPC. Dataflow workers, Dataproc nodes, Cloud Composer's GKE node pool, and Vertex AI training jobs all spin up compute in a subnet you specify. In a hardened environment, those workers are typically launched with no external IP. The org policy constraints/compute.vmExternalIpAccess often enforces this at the organization level.

Without Private Google Access, a Dataflow job in this kind of environment cannot reach the Dataflow service to report progress, cannot read from Cloud Storage, and cannot write to BigQuery. The job will fail to start or hang. The same is true for a Composer environment whose GKE workers cannot reach the Composer control plane or pull operators that talk to Google APIs.

The fix is to enable Private Google Access on the subnet where those workers run. For Dataflow specifically, you also pass --no_use_public_ips (or its equivalent flag) when launching the job so that the workers are created without external IPs in the first place.

The two private endpoints: private.googleapis.com and restricted.googleapis.com

This is the part of Private Google Access that tends to trip people up on exam questions. There are two DNS names you can route Google API traffic through internally, and they behave differently.

  • private.googleapis.com resolves to a set of internal IPs (199.36.153.8/30) that let you reach all Google APIs from on-prem or from internal-only VMs. This is the standard target for Private Google Access in most setups.
  • restricted.googleapis.com resolves to a different range (199.36.153.4/30) and only allows traffic to Google APIs that are part of a VPC Service Controls perimeter. If you have VPC-SC configured to protect BigQuery and Cloud Storage, you want workloads to hit restricted.googleapis.com so that requests are evaluated by the perimeter.

On the exam, if the scenario mentions VPC Service Controls and asks how internal workloads should reach BigQuery, the answer involves restricted.googleapis.com. If the scenario is just about removing external IPs without mentioning a perimeter, private.googleapis.com is the right target.

What you actually configure

There are three pieces that have to line up for this to work.

  • Enable Private Google Access on the subnet. This is a single checkbox in the console or --enable-private-ip-google-access on gcloud compute networks subnets update.
  • Make sure there is a default route from the subnet to the default internet gateway. Private Google Access still uses that route, it just terminates the traffic inside Google's network before it leaves.
  • If you are routing through private.googleapis.com or restricted.googleapis.com explicitly, set up Cloud DNS private zones for googleapis.com so that the relevant hostnames resolve to the internal IP ranges from inside the VPC.

The DNS piece is the one that gets forgotten most often. Without it, the VM resolves bigquery.googleapis.com to a public IP, which it cannot reach without an external address.

How this shows up on the Professional Data Engineer exam

The pattern to recognize is almost always the same. A scenario describes a data pipeline running on Dataflow, Dataproc, or Composer. Security requirements forbid external IPs. The pipeline needs to read from Cloud Storage or write to BigQuery. The question asks what change makes the workload functional without violating the security policy.

The answer is to enable Private Google Access on the subnet hosting the workers. If VPC Service Controls is also in play, the answer expands to include routing through restricted.googleapis.com. Distractor options usually include adding external IPs back, opening firewall rules to the public internet, or proxying through a NAT gateway. None of those are correct in a hardened environment.

My Professional Data Engineer course covers Private Google Access alongside the rest of the networking surface area you need for the exam, including VPC Service Controls, shared VPC, and the specific worker configuration for Dataflow and Composer.

Get tips and updates from GCP Study Hub

arrow