
Pub/Sub questions on the Professional Data Engineer exam love to dwell on what happens when messages do not flow cleanly from publisher to subscriber. The exam will rarely ask you to publish a message in a green-path scenario. It will ask you why message volume just tripled, why a subscriber is burning CPU, or where a failed message ends up after the system gives up. The answers almost always come back to three concepts working together: acknowledgments, exponential backoff, and dead-letter topics.
I want to walk through each one the way I think about them when I am sitting in front of a Pub/Sub question on exam day.
Every Pub/Sub subscription has an acknowledgment deadline. When a subscriber pulls a message, it has until that deadline to send back an ack. If the ack does not arrive in time, Pub/Sub assumes the subscriber failed and redelivers the message. That is the part everyone remembers. The part that trips people up on the Professional Data Engineer exam is the second-order effect.
If your subscriber is silently failing to acknowledge messages, every undelivered ack becomes a redelivery. A small fleet of unhealthy subscribers can suddenly look like a massive spike in incoming traffic. The exam will frame this as a question along the lines of my message volume just doubled, what is the first thing I should check. The correct mental move is to look at acknowledgments, not at the publisher.
There is a sub-pattern worth knowing here. When subscribers fail to ack because of run-time errors, you expect to see those errors in Cloud Logging. If the volume is spiking and Cloud Logging is quiet, that is a tell that the subscriber is not even handling errors properly. The fix then is not just to acknowledge messages faster. You have to address the missing error handling in the subscriber code as well.
Redelivery on its own is dangerous. If a subscriber is overloaded and Pub/Sub keeps retrying immediately, the subscriber falls further behind, more acks miss the deadline, and the retry storm gets worse. Backoff is the pressure valve.
Exponential backoff increases the interval between retry attempts as the number of retries grows. The first few retries happen quickly, a few seconds apart. By the eighth or tenth retry, the wait between attempts has stretched out to minutes. The curve is flat at the start and gets steeper as failures accumulate. This gives the subscriber room to recover from a temporary failure instead of being hammered into the ground by retries.
Linear and constant backoff strategies exist, but on the Professional Data Engineer exam, the answer you want to recognize is exponential backoff. It is the recommended strategy in Pub/Sub because it scales retry intervals efficiently and balances responsiveness against system protection. If a question describes a subscriber that is overwhelmed by retries during a transient outage, exponential backoff is the configuration knob that solves it.
Backoff buys time, but no amount of waiting will help a message that is genuinely broken. A poison message that throws a parsing error every single time will keep failing forever if you let it. That is what the max-retries setting addresses. You configure a maximum number of delivery attempts. Once the count is hit, the message is considered permanently failed.
A reasonable default that shows up in exam-style examples is ten retries. The exact number matters less than the concept. You are drawing a line between this might recover and this is not going to work, stop trying.
Once max retries is reached, the message has to go somewhere. If Pub/Sub just dropped it, you would lose data with no way to investigate. Dead lettering solves that. You configure a separate topic, called a dead-letter topic, and any message that hits the max-retry threshold gets routed there instead of being discarded.
The flow looks like this:
From there, the dead-letter topic is just a normal Pub/Sub topic. You can attach a subscriber to it, dump the messages into BigQuery or Cloud Storage, and investigate at your own pace. The original pipeline keeps flowing because problematic messages have been quarantined.
This is the design pattern the Professional Data Engineer exam will reward you for recognizing. If a scenario describes a stream where a few bad messages keep blocking processing, or where the team wants to inspect failed messages without halting the pipeline, the answer is to configure a dead-letter topic on the subscription.
The way I memorize this for the exam is by chaining the failure modes:
If you can walk that chain from memory, most Pub/Sub reliability questions on the exam stop being tricky. The wrong answers will usually try to send you toward publisher-side changes, like throttling the producer or increasing topic retention. Those are distractors. The real fix for redelivery and failure handling lives on the subscription side.
My Professional Data Engineer course covers Pub/Sub acknowledgments, exponential backoff, max retries, and dead-letter topics with the framing you need to answer these questions quickly on exam day.