
Private Google Access is one of those networking features that quietly shows up across a lot of Professional Data Engineer scenarios. The setup is almost always the same. A security team has locked down external IPs across the project, and now a data pipeline needs to write to BigQuery or read from Cloud Storage from a worker that has no public address. If you do not know what knob to turn, the workload looks broken. If you do, the fix is a single subnet-level toggle.
I want to walk through how this feature actually works, where it intersects with the data engineering services you care about, and the specific gotchas that tend to surface on the Professional Data Engineer exam.
By default, a VM in Google Cloud reaches Google APIs like BigQuery, Cloud Storage, Pub/Sub, or Dataflow's control plane the same way any other client on the internet does. It sends traffic out to a public endpoint such as bigquery.googleapis.com over its external IP. If the VM has no external IP, that traffic has nowhere to go and the call fails.
Private Google Access changes that. When you enable it on a subnet, any VM in that subnet with only an internal IP can still reach Google APIs and managed services. The traffic never traverses the public internet. It stays inside Google's network and exits through a private path to the API endpoint. From the workload's perspective, nothing about the code changes. A gsutil cp or a BigQuery client call works exactly the same. The only difference is that the packets take an internal route.
This is a per-subnet setting, not a per-VM or per-project setting. You enable it on the subnet, and every VM inside that subnet inherits the behavior. A different subnet in the same VPC can have it disabled, and VMs there will need external IPs to reach Google APIs.
Most of the managed data services on Google Cloud run worker VMs inside your VPC. Dataflow workers, Dataproc nodes, Cloud Composer's GKE node pool, and Vertex AI training jobs all spin up compute in a subnet you specify. In a hardened environment, those workers are typically launched with no external IP. The org policy constraints/compute.vmExternalIpAccess often enforces this at the organization level.
Without Private Google Access, a Dataflow job in this kind of environment cannot reach the Dataflow service to report progress, cannot read from Cloud Storage, and cannot write to BigQuery. The job will fail to start or hang. The same is true for a Composer environment whose GKE workers cannot reach the Composer control plane or pull operators that talk to Google APIs.
The fix is to enable Private Google Access on the subnet where those workers run. For Dataflow specifically, you also pass --no_use_public_ips (or its equivalent flag) when launching the job so that the workers are created without external IPs in the first place.
This is the part of Private Google Access that tends to trip people up on exam questions. There are two DNS names you can route Google API traffic through internally, and they behave differently.
On the exam, if the scenario mentions VPC Service Controls and asks how internal workloads should reach BigQuery, the answer involves restricted.googleapis.com. If the scenario is just about removing external IPs without mentioning a perimeter, private.googleapis.com is the right target.
There are three pieces that have to line up for this to work.
--enable-private-ip-google-access on gcloud compute networks subnets update.googleapis.com so that the relevant hostnames resolve to the internal IP ranges from inside the VPC.The DNS piece is the one that gets forgotten most often. Without it, the VM resolves bigquery.googleapis.com to a public IP, which it cannot reach without an external address.
The pattern to recognize is almost always the same. A scenario describes a data pipeline running on Dataflow, Dataproc, or Composer. Security requirements forbid external IPs. The pipeline needs to read from Cloud Storage or write to BigQuery. The question asks what change makes the workload functional without violating the security policy.
The answer is to enable Private Google Access on the subnet hosting the workers. If VPC Service Controls is also in play, the answer expands to include routing through restricted.googleapis.com. Distractor options usually include adding external IPs back, opening firewall rules to the public internet, or proxying through a NAT gateway. None of those are correct in a hardened environment.
My Professional Data Engineer course covers Private Google Access alongside the rest of the networking surface area you need for the exam, including VPC Service Controls, shared VPC, and the specific worker configuration for Dataflow and Composer.