Throughput vs Latency for the PDE Exam: How They Differ

619c7c8da6d7b95cf26f6f70

July 2, 2025

Throughput and latency are two of the most fundamental concepts on the Google Cloud Professional Data Engineer exam, and they show up in question stems more often than most candidates expect. The exam will hand you a scenario where a customer wants a pipeline that ingests massive volumes of clickstream events, and then on the very next question it will ask you to design a system where a dashboard refreshes within milliseconds of an event happening. These are not the same problem. Confusing them is one of the easier ways to pick the wrong answer between two services that both look reasonable on the surface.

I want to walk through how I think about these two metrics, how they differ, and how to spot which one a Professional Data Engineer question is actually testing.

What throughput actually measures

Throughput is the amount of data a system can process over a period of time. The typical units are gigabytes per second or records per second. When a pipeline ingests 2 GBps of incoming events without breaking a sweat, that pipeline has high throughput.

I like the highway analogy here. Throughput is the number of cars that can pass a given point on a highway every second. The more lanes the highway has, the more cars move through at once. A 10-lane highway lets more vehicles pass per second than a 2-lane road, even if every individual car is driving the same speed. In data terms, more parallel workers, more partitions, or more shards translate to higher throughput.

Services on Google Cloud that get described in terms of throughput include Pub/Sub, Dataflow, Bigtable, and BigQuery for batch loads. When you see a question mention petabytes per day, millions of events per second, or a fleet of IoT devices firing telemetry, the design constraint is almost always throughput.

What latency actually measures

Latency is the delay between when data is ingested and when it becomes available for querying or further processing. Low latency means that delay is minimal and data is usable in near real-time. A system where ingested data becomes queryable within a few milliseconds is a low-latency system.

Back to the highway analogy. You exit the highway and hit a traffic light at the intersection. That pause at the light is latency. It does not matter how many lanes the highway had. What matters is how long any single car waits at the off-ramp before it can keep moving.

On Google Cloud, low-latency reads typically come from Bigtable, Memorystore, or Spanner. For analytics, BigQuery streaming inserts give you near real-time queryability of new rows. When a question talks about sub-second response times, real-time fraud detection, or dashboards that refresh as events happen, the design constraint is latency.

Why high throughput does not mean low latency

This is the trap. The exam loves to test whether you understand that these two metrics are independent. A system can have high throughput and high latency at the same time. A system can also have low latency and low throughput.

High throughput plus high latency: a system processes huge volumes but takes time before the data is usable. Think of a batch load that ingests terabytes overnight but the data is not queryable until the morning.
Low latency plus low throughput: data is available almost instantly after ingestion, but the system cannot handle large volumes at once. Think of a small Memorystore instance returning lookups in milliseconds but unable to absorb a firehose of writes.
High throughput plus low latency: the goal for most streaming analytics. This is also the most expensive and most engineered configuration.

The picture that locked this in for me is the toll plaza in China with more than 40 toll booths. Every individual car waits a long time at the booth, so the latency is high. But because there are so many booths operating in parallel, the total number of cars processed per hour is enormous. High latency, high throughput, same system.

How this shows up on Professional Data Engineer exam questions

When I read a scenario question, I look for the verb and the unit. If the prompt is talking about volume per unit time, I am in throughput territory. If the prompt is talking about freshness, staleness, or time-to-availability, I am in latency territory.

A few patterns worth memorizing:

Pub/Sub plus Dataflow streaming plus BigQuery streaming inserts is the go-to high-throughput, low-latency pipeline.
Pub/Sub plus Dataflow batch plus BigQuery load jobs is high-throughput, higher-latency. Pick this when freshness is measured in hours, not seconds.
Bigtable shines when both metrics matter at scale, especially for time-series data where you need single-digit millisecond reads on terabytes of rows.
Cloud Storage plus scheduled queries is the cheapest high-throughput, high-latency option for analytics that do not need to be fresh.

The trick is to not pattern-match on the data volume alone. A question that mentions 5 TB of daily data does not automatically demand a low-latency service. Look for what the consumer of the data needs. If the consumer is a nightly report, latency does not matter. If the consumer is a live dashboard or a real-time alerting system, latency is the whole game.

One more way I sanity-check myself

Before I commit to an answer on a Professional Data Engineer scenario, I ask myself two questions in order. First, how much data per second does this system have to absorb. Second, how fresh does the output have to be. Those two answers point me at the right service combination almost every time, and they keep me from picking a low-latency service when the real constraint was raw volume.

My Professional Data Engineer course covers throughput and latency tradeoffs across every major data service on Google Cloud, with worked examples of how to read exam scenarios and pick the right pipeline architecture.