Dataflow Windowing for the PDE Exam: 3 Window Types

619c7c8da6d7b95cf26f6f70

September 23, 2025

Streaming pipelines never stop. Data flows in constantly, which raises an immediate question for anyone building on Dataflow: how do you group elements together for aggregation when the stream has no natural end? The answer the Professional Data Engineer exam expects you to know is windowing. Dataflow gives you three window types, and each one solves a different problem. If you can pick the right window for a given scenario, you will get a real chunk of the streaming questions on the exam without thinking twice.

I want to walk through tumbling, hopping, and session windows the way I think about them when I'm building real pipelines, because the exam scenarios almost always map to one of those mental models.

Why windows exist in the first place

A bounded batch job is easy. You read the file, you compute the average, you write the output. A streaming job has no end, so you cannot compute an average over the whole stream because the stream never finishes. Windows let you slice an unbounded stream into bounded chunks that you can run aggregations against. Once you start thinking about windows as the only way to apply a GROUP BY to an infinite stream, the three types start to make a lot more sense.

Dataflow gives you three window strategies: tumbling (also called fixed), hopping (also called sliding), and session-based.

Tumbling windows: fixed, non-overlapping, sequential

Tumbling windows are the simplest. You pick a duration, say 30 minutes, and Dataflow divides the stream into back-to-back chunks of that length. The first window covers 12:00 to 12:30, the next covers 12:30 to 1:00, the next covers 1:00 to 1:30, and so on. Every event belongs to exactly one window. There is no overlap.

Three properties define tumbling windows:

Fixed duration: every window is the same length.
Non-overlapping: no event is counted twice.
Sequential: windows follow each other in a continuous chain.

Use tumbling windows when you want clean, periodic reports. Average order value per hour. Total error count per 5 minutes. Page views per day. Anything where the report represents a discrete time bucket and you do not want the buckets to share data.

On the exam, look for words like every 5 minutes, hourly summary, or non-overlapping reporting period. Those almost always point to fixed windows.

Hopping windows: fixed duration, overlapping, fixed hop interval

Hopping windows are where things get more interesting. The window itself is still a fixed size, but a new window starts on a fixed interval that is shorter than the window length. That second number is called the hop.

Take a 30-minute window with a 5-minute hop. At 12:30 you get a window covering 12:00 to 12:30. At 12:35 you get a window covering 12:05 to 12:35. At 12:40 you get a window covering 12:10 to 12:40. Each event in the stream belongs to multiple windows because the windows overlap.

The three defining properties:

Fixed duration: every window covers the same span of time.
Overlapping: events appear in more than one window.
Fixed hop interval: a new window starts on a regular cadence.

The use case is a running metric that you want updated frequently but computed over a longer horizon. A 20-minute moving average of stock prices recomputed every minute is the canonical example. You want fresh output every minute, but you want each output to reflect 20 minutes of context. Tumbling windows cannot do this. Hopping windows are built for it.

On the exam, the giveaway phrase is something like compute the last X minutes every Y minutes where X is bigger than Y. That is a hopping window with a window size of X and a hop of Y.

Session windows: dynamic, gap-defined, user-centric

Session windows throw out the idea of a fixed duration entirely. Instead of saying every 30 minutes, you say keep grouping events together until I see a gap of N minutes with no activity. When the gap occurs, the window closes. The next event starts a fresh window.

So if I set a 5-minute gap duration, and events arrive every 4 minutes and 59 seconds for an hour straight, Dataflow treats all of it as a single session. The moment 5 minutes go by with nothing, the session ends. The next event opens a new one.

The properties:

Dynamic length: each session can be any duration.
Gap-defined: the closing condition is silence, not elapsed time.
User-centric: sessions are typically computed per key, so each user or device gets its own.
Natural groupings: the window matches actual behavior in the data.

The classic example is the way Google Analytics defines a website session, which uses a 30-minute gap. Your visit stays one session until you go 30 minutes without doing anything. Anything that looks like that pattern (a user's gameplay session, a device's burst of telemetry, a customer's support chat) is a session-window problem.

How to lock this in for the Professional Data Engineer exam

When you read a streaming scenario on the exam, run through these three questions in order:

Are the reporting periods fixed and back-to-back with no overlap? Tumbling.
Do you need overlap because the metric is a moving average or rolling count? Hopping.
Is the grouping defined by user activity and inactivity rather than by clock time? Session.

The Professional Data Engineer exam loves to test whether you can distinguish hopping from tumbling, because both have a fixed window size. The discriminator is overlap and the hop interval. And session-window questions almost always include the word activity, inactivity, or user behavior in the scenario.

One more thing. Windows work on event time by default in Dataflow, which means late-arriving data still gets routed to the window it belongs to (subject to your allowed lateness settings). That detail matters for exam questions about correctness in streaming, but the windowing strategy itself is independent of how you handle late data.

My Professional Data Engineer course covers Dataflow windowing along with watermarks, triggers, and the full streaming model you need for the exam.

Dataflow Windowing for the PDE Exam: Tumbling, Hopping, Session

Why windows exist in the first place

Tumbling windows: fixed, non-overlapping, sequential

Hopping windows: fixed duration, overlapping, fixed hop interval

Session windows: dynamic, gap-defined, user-centric

How to lock this in for the Professional Data Engineer exam

Get tips and updates from GCP Study Hub