Pub/Sub IoT Gateway Pattern for the PDE Exam

GCP Study Hub
619c7c8da6d7b95cf26f6f70
September 13, 2025

One of the patterns I make sure every candidate locks in before sitting the Google Cloud Professional Data Engineer exam is the IoT gateway feeding Pub/Sub. It is a narrow topic, but it shows up often enough in scenario questions that missing it is an easy way to give back points. The good news is that once you understand what the gateway does and why Pub/Sub sits behind it, the answer choices on the exam usually pick themselves.

Picture a fleet of devices generating data continuously. Thousands of tractors out in fields, wearable health monitors on patients, smart home sensors, connected vehicles on the road. Each one produces a steady stream of telemetry that has to get to the cloud for processing or storage. You could try to have every device talk directly to Pub/Sub, but that quickly falls apart. Devices drop off the network. Cellular connections flap. Some sensors only have enough power to send short bursts. You need a piece of hardware or software sitting in the middle that collects all of that data, holds onto it when the link is unstable, and forwards it once a clean connection is available. That middle layer is the IoT gateway.

What the IoT gateway actually does

The IoT gateway bridges the gap between IoT devices and cloud services like Pub/Sub. It plays two roles that matter for the Professional Data Engineer exam.

  • Data aggregation. The gateway collects and processes data from many devices at once. Even when individual devices have intermittent connectivity, the gateway keeps pulling their data in locally so nothing is dropped at the edge.
  • Reliable transmission. The gateway buffers data on local storage and only sends it upstream when a stable connection to the cloud is available. That buffering is what prevents data loss when the network is flaky.

Once the gateway has data to send, it pushes that data into Pub/Sub. Pub/Sub then acts as the messaging backbone, distributing those messages to whichever downstream services need them.

Why Pub/Sub sits behind the gateway

Pub/Sub is the right destination for gateway traffic because it absorbs spikes, decouples producers from consumers, and scales to handle messages from thousands or millions of devices without you provisioning anything. The gateway does not need to know what is reading the messages. It just publishes to a topic.

From the topic, the data fans out to the services that actually do the work.

  • Cloud Dataflow for transformations and real-time analytics on the streaming data.
  • BigQuery for long-term storage and querying, often as the warehouse where the cleaned data lands.
  • Cloud Storage for raw archival of the events if you want to replay them later.

That fan-out pattern is the part that makes the architecture scalable. Add more devices, add more gateways, and Pub/Sub keeps absorbing the load. The downstream services scale independently based on what they each need to do.

How this shows up on the Professional Data Engineer exam

The exam loves scenarios that describe a large IoT deployment and then ask you to pick the right ingestion architecture. The tells are almost always the same. You will see language about a large number of distributed devices, intermittent or unreliable connectivity, and a need to process or store the data once it reaches Google Cloud.

When you see that combination, the answer almost always involves the gateway plus Pub/Sub pattern, with Dataflow handling the streaming transforms and BigQuery or Cloud Storage on the back end. Distractors will try to pull you toward having devices write directly to BigQuery or Cloud Storage, or toward putting a Compute Engine VM in the middle to do custom buffering. Those answers ignore the role the gateway plays at the edge and ignore why Pub/Sub is the right decoupling layer.

A few specifics worth keeping in your head walking into the exam.

  • The gateway handles buffering at the edge. If a question stresses that data must not be lost during network outages, the gateway is doing that work, not Pub/Sub.
  • Pub/Sub is the messaging backbone, not the processor. If transformations are required, Dataflow comes next.
  • The pattern scales to millions of devices because Pub/Sub is fully managed and elastic. You are not sizing brokers or partitions.

That is really the whole pattern. Devices talk to a gateway. The gateway aggregates and buffers. The gateway publishes to Pub/Sub. Pub/Sub feeds Dataflow, BigQuery, or Cloud Storage depending on whether you need streaming transforms, querying, or archival. Once you can draw that diagram from memory, IoT scenarios on the Professional Data Engineer exam stop being scary.

My Professional Data Engineer course covers the Pub/Sub IoT gateway pattern alongside the rest of the streaming ingestion topics you need for exam day.

Get tips and updates from GCP Study Hub

arrow