Pub/Sub vs Apache Kafka for the PDE Exam

619c7c8da6d7b95cf26f6f70

August 22, 2025

One of the most reliable patterns I see on the Professional Data Engineer exam is the Kafka-to-Pub/Sub swap. A company runs Apache Kafka on-premises, they want to move to a managed autoscaling messaging layer, and the right answer is almost always Cloud Pub/Sub. If you walk into the exam with that single instinct locked in, you will pick up easy points on a surprising number of questions.

But the real Professional Data Engineer exam questions are rarely that clean. They will hand you a hybrid setup, a partial migration, or a logging pipeline that spans two clouds, and you need to know when Pub/Sub stands on its own, when Kafka stays in the picture, and when the two work together through connectors. This article walks through how I think about each of those scenarios.

What Pub/Sub actually is

Pub/Sub is Google Cloud's managed messaging service, and it plays the same role that Apache Kafka plays in a self-hosted stack. Producers publish messages to a topic, consumers subscribe to that topic, and the service decouples the two so you can scale them independently. The functional surface area is very similar to Kafka, but Google operates the underlying infrastructure for you. There are no brokers to size, no Zookeeper or KRaft cluster to babysit, no partition rebalancing to plan around.

That managed nature is the whole reason Pub/Sub keeps showing up as the correct answer on the exam. The question stems almost always include a phrase like fully managed, autoscaling, reduce operational overhead, or global reach. Those are signals pointing you to Pub/Sub, not Kafka on Compute Engine and not a self-managed cluster of any kind.

The default exam pattern: on-prem Kafka goes to Pub/Sub

The cleanest version of this question looks like this. A company is running Apache Kafka on-premises. They want to migrate to Google Cloud, or they want a managed autoscaling messaging layer, or they want to reduce the team's operational burden. What is the recommended replacement?

Pub/Sub. That is the answer you should default to whenever you see that setup. Drill it into your head: when in doubt, on-prem Kafka goes to Pub/Sub. The reasons the exam will cite are scalability, global reach, and reducing operational overhead, and Pub/Sub checks all three boxes because Google runs it for you across regions.

From there, the downstream pipeline almost always involves Dataflow. Pub/Sub ingests and buffers the data, and Dataflow handles the real-time transformations or analytics. That Pub/Sub plus Dataflow pairing shows up so often on the Professional Data Engineer exam that you should treat it as the default streaming ingestion pattern unless the question explicitly steers you elsewhere.

When Kafka and Pub/Sub run side by side

The trickier exam scenarios are the ones where both technologies stay in the architecture. These are the hybrid and partial-migration cases, and they come up a lot because real companies do not migrate in a single weekend.

Hybrid workloads where on-prem data still feeds GCP. Kafka manages streams locally, and Pub/Sub acts as the bridge that forwards those messages into Google Cloud for analytics or machine learning downstream.
Gradual migrations of an existing Kafka architecture. Kafka continues handling the legacy on-prem workloads while Pub/Sub takes over the new cloud-native workloads. Both systems run in parallel during the transition.
Log collection across environments. Kafka aggregates logs from on-prem systems, then Pub/Sub transmits those logs into Google Cloud so they can land in BigQuery or flow through Dataflow for real-time monitoring.
GCP apps that need to read on-prem data. Kafka holds the data on-premises, and Pub/Sub gives the GCP application a real-time stream it can subscribe to without ripping out the existing Kafka deployment.

If a question describes any of these shapes, do not jump straight to replacing Kafka. The right answer is usually keeping both and connecting them with Pub/Sub as the cloud-side endpoint.

Kafka connectors: source vs sink

The mechanism that makes those hybrid patterns work is the Pub/Sub Group Kafka Connector. Google provides it specifically so Kafka and Pub/Sub can hand messages back and forth without you writing custom glue code. There are two connector types and the naming trips people up, so I want to be precise about it.

The source connector reads messages from a Pub/Sub topic and publishes them into Kafka. Pub/Sub is the source of the data, Kafka is the destination. You would use this when cloud-based producers are putting messages into Pub/Sub and you need those messages to land in a Kafka topic that some existing on-prem or third-party system already consumes.

The sink connector reads messages from one or more Kafka topics and publishes them to Pub/Sub. Kafka is the source, Pub/Sub is the sink. This is the more common direction in exam questions, because the typical scenario is on-prem Kafka feeding GCP-side analytics through Pub/Sub.

The thing to remember is that source and sink describe Pub/Sub's role, not Kafka's. If Pub/Sub is publishing to Kafka, it is the source. If Pub/Sub is receiving from Kafka, it is the sink. Get that framing right and the connector questions become straightforward.

Other message buffers point to the same answer

One last pattern worth filing away. Kafka is not the only messaging system you might see in a question stem. Amazon SQS, Redis Pub/Sub, Apache ActiveMQ, and RabbitMQ all show up too. When any of those appear, the same rule applies. If the company wants to move to Google Cloud or modernize their messaging layer, the answer almost always involves transitioning to Cloud Pub/Sub as the cloud-native replacement.

That makes the broader heuristic pretty simple. Any time a question mentions an existing third-party message buffer alongside a goal of running on Google Cloud, Pub/Sub is the destination you should be looking for in the answer choices.

My Professional Data Engineer course covers Pub/Sub, Kafka migration patterns, and the connector model in detail so you walk into the exam knowing exactly which version of the question you are looking at.

Pub/Sub vs Apache Kafka for the PDE Exam: When to Use Each

What Pub/Sub actually is

The default exam pattern: on-prem Kafka goes to Pub/Sub

When Kafka and Pub/Sub run side by side

Kafka connectors: source vs sink

Other message buffers point to the same answer

Get tips and updates from GCP Study Hub