
Throughput and latency are two of the most fundamental concepts on the Google Cloud Professional Data Engineer exam, and they show up in question stems more often than most candidates expect. The exam will hand you a scenario where a customer wants a pipeline that ingests massive volumes of clickstream events, and then on the very next question it will ask you to design a system where a dashboard refreshes within milliseconds of an event happening. These are not the same problem. Confusing them is one of the easier ways to pick the wrong answer between two services that both look reasonable on the surface.
I want to walk through how I think about these two metrics, how they differ, and how to spot which one a Professional Data Engineer question is actually testing.
Throughput is the amount of data a system can process over a period of time. The typical units are gigabytes per second or records per second. When a pipeline ingests 2 GBps of incoming events without breaking a sweat, that pipeline has high throughput.
I like the highway analogy here. Throughput is the number of cars that can pass a given point on a highway every second. The more lanes the highway has, the more cars move through at once. A 10-lane highway lets more vehicles pass per second than a 2-lane road, even if every individual car is driving the same speed. In data terms, more parallel workers, more partitions, or more shards translate to higher throughput.
Services on Google Cloud that get described in terms of throughput include Pub/Sub, Dataflow, Bigtable, and BigQuery for batch loads. When you see a question mention petabytes per day, millions of events per second, or a fleet of IoT devices firing telemetry, the design constraint is almost always throughput.
Latency is the delay between when data is ingested and when it becomes available for querying or further processing. Low latency means that delay is minimal and data is usable in near real-time. A system where ingested data becomes queryable within a few milliseconds is a low-latency system.
Back to the highway analogy. You exit the highway and hit a traffic light at the intersection. That pause at the light is latency. It does not matter how many lanes the highway had. What matters is how long any single car waits at the off-ramp before it can keep moving.
On Google Cloud, low-latency reads typically come from Bigtable, Memorystore, or Spanner. For analytics, BigQuery streaming inserts give you near real-time queryability of new rows. When a question talks about sub-second response times, real-time fraud detection, or dashboards that refresh as events happen, the design constraint is latency.
This is the trap. The exam loves to test whether you understand that these two metrics are independent. A system can have high throughput and high latency at the same time. A system can also have low latency and low throughput.
The picture that locked this in for me is the toll plaza in China with more than 40 toll booths. Every individual car waits a long time at the booth, so the latency is high. But because there are so many booths operating in parallel, the total number of cars processed per hour is enormous. High latency, high throughput, same system.
When I read a scenario question, I look for the verb and the unit. If the prompt is talking about volume per unit time, I am in throughput territory. If the prompt is talking about freshness, staleness, or time-to-availability, I am in latency territory.
A few patterns worth memorizing:
The trick is to not pattern-match on the data volume alone. A question that mentions 5 TB of daily data does not automatically demand a low-latency service. Look for what the consumer of the data needs. If the consumer is a nightly report, latency does not matter. If the consumer is a live dashboard or a real-time alerting system, latency is the whole game.
Before I commit to an answer on a Professional Data Engineer scenario, I ask myself two questions in order. First, how much data per second does this system have to absorb. Second, how fresh does the output have to be. Those two answers point me at the right service combination almost every time, and they keep me from picking a low-latency service when the real constraint was raw volume.
My Professional Data Engineer course covers throughput and latency tradeoffs across every major data service on Google Cloud, with worked examples of how to read exam scenarios and pick the right pipeline architecture.