
Throughput and latency are two measures of how a database performs, and they describe different things. Throughput is the amount of work a system gets through over a period of time. Latency is how long a single operation takes to complete. The Professional Cloud Database Engineer exam expects you to keep these two apart, because a system can be strong on one and weak on the other, and the right answer to a scenario often depends on which one the workload actually cares about.
Throughput refers to the amount of operations a system handles over a period of time. High throughput means the system can process a large volume of operations efficiently. A database that processes thousands of transactions per second would generally be considered to have high throughput, because it completes a large number of operations within a short window.
A useful way to picture it is traffic on a highway. Throughput is like the number of cars that can pass through per second. The more lanes the highway has, the more cars can move through at once, and in the same way a system with high throughput can handle more operations at the same time.
Latency refers to the amount of time it takes for an operation or request to complete. Low latency means that delay is minimal, so requests are processed and results are returned quickly. In a low-latency system, a query might return results within just a few milliseconds.
Staying with the highway, latency is the delay you hit at the intersection right after you take your exit. You may have been moving smoothly, and then you reach a traffic light and wait briefly before continuing. That short delay is what latency describes for a single request.
The point to hold onto is that these are not the same thing, and high throughput does not always mean low latency. A system can be built to handle a large volume of operations and still take time to return the result for each individual request. That is high throughput together with high latency. The opposite case also exists. A system where each operation completes quickly has low latency, but that on its own does not guarantee it can handle large volumes of operations at once, which would be low throughput.
The traffic picture makes the combination clearer. Even if there is a delay at the light, which represents high latency, a highway with many lanes can still let many cars pass over time, which represents high throughput. A system with low latency might let fewer cars pass at a time, so each car gets through quickly but the total number passing through over a period is lower, meaning low throughput. A real example is a highway with more than forty toll lanes. Each car waits a noticeable amount to pass through a toll, so latency per car is high, but because there are so many lanes a very large number of cars still get through overall, so throughput is high.
For the Professional Cloud Database Engineer exam, the value of separating these two metrics is that it tells you which one a given workload is sensitive to. A workload that ingests a high volume of writes cares mostly about throughput, while an interactive workload where a user is waiting on each response cares mostly about latency. Because the two can move independently, a design that improves one does not automatically improve the other, and a question that describes a requirement in terms of operations per second is asking about something different from a question that describes a requirement in milliseconds per request. Reading the scenario for which measure is stated keeps you from optimizing for the wrong one.
Our Professional Cloud Database Engineer course covers throughput and latency alongside read and write scaling patterns, with practice questions that drill these distinctions.