
Pub/Sub questions on the Professional Data Engineer exam tend to cluster around the same few mechanics, and the message lifecycle is the spine that ties them together. If you can describe what happens to a message from publication through deletion, and where retention, snapshots, and seek fit into that flow, most of the trickier scenario questions become a lot easier to answer.
I want to walk through the lifecycle the way I think about it on the exam, then layer in the three replay and recovery features that the Professional Data Engineer blueprint loves to test.
Every Pub/Sub message moves through the same sequence. Memorizing this order makes it much easier to spot wrong answers on scenario questions.
The exam likes to attack this last step. Candidates assume an acknowledged message is unrecoverable, but the retention and seek features can change that assumption. That is exactly where the next three concepts come in.
Message retention duration is the time period for which Pub/Sub keeps messages before deleting them. The whole point of retention is to make messages available for replay later, whether for reprocessing, auditing, or recovering from a bad downstream change.
There are two flavors of retention, and the Professional Data Engineer exam expects you to pick the right one for the scenario.
The defaults matter. If a question describes a subscriber crashing and asks why messages were lost despite topic retention being set, the trap is usually that the team forgot to configure subscription retention, so unacked messages were never held for that subscriber. Topic retention holds acked messages on the topic. Subscription retention holds unacked messages for a subscription. They solve different problems.
A snapshot captures a specific state of a subscription as a recovery point for potential future use. The mental model I use is a photo of the subscription at a moment in time, where Pub/Sub remembers exactly which messages were acked and which were not.
The classic use case is creating a known good state to revert to before a major change. If I am about to roll out a big update to my processing logic, I take a snapshot of the subscription first. If the new code mishandles messages, I can return the subscription to that earlier acknowledgment state and reprocess everything that was in flight at the time the snapshot was taken.
For the Professional Data Engineer exam, the trigger phrase is usually something like "before deploying a new version of the pipeline" or "in case the new processing logic has a bug." When you see that framing, snapshots are almost always part of the answer.
The Seek feature is what lets you actually use a snapshot or a retention window. Seek lets you change the acknowledgment state of messages, including already-acknowledged messages, in bulk. That is the line that surprises people. Acked does not have to mean gone.
There are two ways to seek.
Seek only works within the retention window. If a message is no longer being retained, there is nothing to replay. That coupling between retention duration and the seek feature is the piece the exam loves to test. Retention determines how far back you can go. Seek determines how you navigate within that window.
When I read a Pub/Sub scenario on the Professional Data Engineer exam, I run through the same checklist:
If you can hold those four mappings in your head, you can usually narrow most Pub/Sub questions down to two answer choices before reading the options. The remaining work is just confirming the defaults: topic retention is 7 days by default with a 31 day max, subscription retention is off by default with a 31 day max, and seek can target either a snapshot or a timestamp inside the retention window.
My Professional Data Engineer course covers the full Pub/Sub message lifecycle, including retention, snapshots, and seek, alongside the rest of the streaming ingestion topics on the exam.