
Dataflow shows up all over the Professional Data Engineer exam, and a chunk of the trickier questions are not about building a pipeline from scratch. They are about diagnosing one that is already running and behaving badly. The three failure modes Google keeps coming back to are increased latency, missing messages in streaming pipelines, and out-of-order data. If you can talk through each of these with confidence, you will pick up a meaningful number of points on exam day.
Here is how I think about each one.
Latency questions usually drop a scenario in your lap where a Dataflow job that used to run cleanly is suddenly slow, and the exam wants to know what you check first. The framing I keep in my head is that there are two flavors of latency worth watching: end-to-end latency, which is the time it takes for a record to traverse the entire pipeline, and per-stage latency, which is how long an individual stage holds onto data before passing it on.
The signals that point to a bottleneck are pretty consistent:
When you see those signals, the troubleshooting path is fairly mechanical. First, pull the pipeline job logs and the worker logs. Those give you detailed information about what is happening inside each stage and on each VM. Second, identify the specific step that is dragging. The exam will often hand you a graph or a description and expect you to point at the slow stage rather than rewriting the whole job. Third, look for the underlying cause, which usually falls into one of three buckets: resource limitations on the workers, data skew where one worker is processing far more records than its peers, or an external dependency such as a slow downstream sink.
Data skew is the one I would internalize most. If the question describes one worker pegged at full CPU while the others are idle, that is the answer they want you to spot.
The second failure mode is messages disappearing from a streaming pipeline. The Professional Data Engineer exam likes this one because the right answer is not obvious. You have to know a specific diagnostic technique.
The symptoms you watch for are gaps in the data, aggregations that look incomplete relative to what you expected, and a sudden drop in throughput. Any of those can mean records are being dropped somewhere between your source and your sink.
The technique Google teaches for this scenario is to convert the streaming pipeline into a batch run and compare the outputs. The exam answer almost always walks through these four steps:
The reason this works is that batch processing strips away all the timing concerns. There are no late records, no watermark issues, no window expirations. If the batch run returns the records you thought were lost, you have proven the data was always there and the streaming configuration is what dropped them. That isolation is the whole point of the exercise.
The third common challenge is out-of-order data, which is just the reality of streaming. Records do not arrive at Dataflow in the same order they were generated. Network paths differ, devices buffer, and clocks drift. The exam expects you to know the three primitives Dataflow gives you to handle this and what each one does.
The exam loves combination questions on this. You will see a scenario where late records need to update an existing aggregate, and the correct answer involves a window plus a trigger that fires on late data. You will see another scenario where the team wants results emitted on a regular schedule even if data is still arriving, and the right answer is a processing-time trigger. Knowing which primitive solves which problem is the whole game.
When you see a Dataflow troubleshooting question on the Professional Data Engineer exam, slot it into one of these three buckets first. Latency means logs and bottleneck analysis. Missing messages means the batch comparison technique. Out-of-order data means windows, watermarks, and triggers. That framing turns a vague troubleshooting prompt into a known recipe.
My Professional Data Engineer course covers Dataflow troubleshooting patterns, streaming semantics, and the windowing model in depth.