Pub/Sub Intro for the PDE Exam: Coupled Messaging

619c7c8da6d7b95cf26f6f70

August 20, 2025

One of the first messaging concepts the Professional Data Engineer exam expects you to understand is the difference between a tightly coupled system and a loosely coupled system. Pub/Sub is on the exam because it is the canonical example of the loosely coupled pattern in Google Cloud, and a lot of data pipeline questions hinge on whether you spot the need for a buffer in the middle of a workflow. I want to walk through how I frame this for candidates so the underlying intuition sticks, then layer on the specifics of Pub/Sub itself.

Tightly coupled vs loosely coupled messaging

Picture a sender and a receiver. The sender might be a database, an application, or a device emitting telemetry. The receiver is whatever system needs to act on that data. In a tightly coupled setup, the sender talks directly to the receiver. Every message that gets sent has to be processed by the receiver in that moment. There is no flexibility and no buffer. It is a direct line from point A to point B.

The problem with that design shows up the moment the receiver hits trouble. If the receiver is down, slow, or overwhelmed, the whole communication chain breaks. The sender has nowhere to put the messages, so they get lost or the sender itself starts failing. Tightly coupled systems are brittle, and they do not scale well when traffic spikes.

A loosely coupled system inserts a vehicle between the sender and the receiver. We call this vehicle a message bus, or a buffer. Messages get placed into a queue on the bus. The sender can keep producing messages as fast as it wants, without caring whether the receiver is ready. If the receiver is temporarily offline, the messages wait in the queue, and the receiver picks them up when it comes back online. The system becomes fault tolerant and scalable because the two halves are decoupled.

That is the entire mental model. Pub/Sub is the message bus. Whenever you see an exam scenario where a producer and a consumer need to operate independently or asynchronously, or where the producer is generating data faster than the consumer can handle, the answer almost always involves Pub/Sub sitting in the middle.

What Pub/Sub actually is

Cloud Pub/Sub is a global-scale messaging service. It decouples senders from receivers by letting senders publish messages to a topic, and letting one or more subscribers pull those messages from a subscription on that topic. The producer does not know or care who the subscribers are. The subscribers do not know or care who produced the message. They only know about the topic.

A few properties the Professional Data Engineer exam expects you to know cold:

Global scale. Pub/Sub is built to handle very high message volumes across regions without you provisioning capacity.
Serverless and fully managed. There is no cluster to size, no brokers to patch, no nodes to monitor. Google handles all of the underlying infrastructure. This is often described as a No-Ops service.
Decoupling for reliability. Because the buffer absorbs spikes and outages, systems built on Pub/Sub are more resilient than systems that wire producers directly to consumers.

If a question describes a workload where ingestion volume is unpredictable, where you want producers and consumers to scale independently, or where the consumer might occasionally be slower than the producer, those are all signals pointing at Pub/Sub.

Pub/Sub vs Apache Kafka

A comparison that comes up a lot is Pub/Sub vs Apache Kafka. The short version is that Pub/Sub is Google's managed version of the same idea Kafka pioneered as an open-source project. Both are publish-subscribe messaging systems. Both let producers fire messages at topics and let consumers read them through subscriptions. The functional model is similar.

The difference is operational. With Kafka, you are responsible for the brokers, the partitioning, the storage, the scaling, the upgrades, and the failure recovery. With Pub/Sub, Google handles all of that. You do not provision nodes. You do not size partitions. You just create a topic, create a subscription, and start publishing. For exam purposes, when a scenario emphasizes minimal operational overhead or rapid scale without infrastructure planning, Pub/Sub is the right pick over self-managed Kafka.

Where Pub/Sub fits in a data pipeline

The most common pattern you will see on the Professional Data Engineer exam is Pub/Sub used for data ingestion at the front of a pipeline. Data sources publish events into a Pub/Sub topic. Those events sit in the buffer. A downstream processing service then reads from the subscription.

The pairing that shows up over and over is Pub/Sub with Dataflow. Pub/Sub collects and buffers the raw events. Dataflow reads from the Pub/Sub subscription, applies real-time transformations or aggregations, and writes the processed output to a sink like BigQuery or Cloud Storage. This is the textbook streaming pipeline on Google Cloud. If you see a question about ingesting streaming data from IoT devices, application logs, or clickstream events into BigQuery, the answer is almost always Pub/Sub feeding Dataflow feeding BigQuery.

Other use cases worth keeping in mind include fan-out patterns where one published event needs to trigger many independent consumers, asynchronous workflows where you want to acknowledge a request before fully processing it, and event-driven architectures where services react to changes without polling each other.

How to think about it on exam day

When you read a Professional Data Engineer scenario, ask yourself whether the producer and consumer are tightly coupled in the proposed design. If they are, and the scenario describes spikes, failures, or scale-independent components, the fix is almost always a message bus. On Google Cloud, that message bus is Pub/Sub. Pair it with Dataflow when there is real-time processing involved, and you have the answer to a large chunk of streaming pipeline questions.

My Professional Data Engineer course covers Pub/Sub in depth, including subscription types, delivery semantics, and the streaming pipelines that build on top of it.

Pub/Sub Intro for the PDE Exam: Tightly vs Loosely Coupled Messaging

Tightly coupled vs loosely coupled messaging

What Pub/Sub actually is

Pub/Sub vs Apache Kafka

Where Pub/Sub fits in a data pipeline

How to think about it on exam day

Get tips and updates from GCP Study Hub