Batch vs Streaming Data Processing for the Professional Cloud Database Engineer Exam

GCP Study Hub
May 2, 2026

Batch and streaming are two ways of processing data, and they differ mainly in when and how the data is handled. Batch processing works on data in large, scheduled chunks. Streaming processing works on data continuously as it arrives. Most questions about this distinction on the Professional Cloud Database Engineer exam come down to matching a scenario to the right approach, and then matching that approach to the Google Cloud service that supports it. Knowing the vocabulary and the trade-offs is usually enough to tell the answers apart.

Batch processing and bounded data

Batch processing means data is collected first and then processed all at once in large chunks. These jobs typically run at specific intervals, such as hourly, daily, or weekly. It is worth noting that batch jobs are not always scheduled on a clock. They can also be triggered when a condition or threshold is met, or run manually when someone decides it is time to process. The defining characteristic is that the data is gathered into a defined set before processing begins.

Batch processing fits situations where data does not need to be acted on instantly. Common examples are financial reporting, inventory management, and customer billing. In all of these, the data is accumulated over a period and then processed together. Batch data is sometimes called bounded data, because it is a defined, finite set collected before the job runs.

Streaming processing and unbounded data

Streaming processing handles data continuously, in real time or near real time, as it comes in. Instead of collecting data first and processing it later, streaming works with each piece of data immediately upon arrival. This suits scenarios where a rapid response matters. Fraud detection is one, where the goal is to flag suspicious activity as soon as possible. IoT sensor data is another, where devices are constantly sending information. System monitoring is a third, where immediate insight is needed to keep systems running.

Streaming data is sometimes called unbounded data, because it is processed as it flows in rather than waiting to accumulate into a larger batch. There is no defined end to the set. The contrast between bounded and unbounded data is the cleanest way to keep the two models straight, and the exam uses that framing.

Trade-offs between the two

Each approach has strengths and weaknesses. Batch processing is efficient for large volumes of data, since it deals with everything in scheduled chunks rather than one record at a time. Its downside is that there can be a delay in insights, because you wait for all the data to be gathered before it is processed.

Streaming has the opposite profile. Its main advantage is low latency, which enables quick decision making when you need to act on data the moment it arrives. The cost of that is resources. Streaming is more resource-intensive and requires robust infrastructure to handle a continuous flow of data in real time, which makes it more complex and demanding to run.

Choosing between batch and streaming

A few considerations drive the decision. The first is data volume, meaning how much data is being processed. A very large amount can point toward batch, while smaller, steady increments can suit streaming. The second is speed of analysis, meaning how soon insights are needed. If a real-time reaction is required, streaming is essential, and if it is acceptable to wait, batch can be more efficient. The third is infrastructure, meaning whether enough resources are available to support continuous real-time processing.

The choice is not always one or the other. A hybrid approach combines both, and is sometimes the best option. A common pattern is to send real-time alerts for urgent situations using streaming, while also generating detailed reports on the accumulated data later using batch. As for the broader trend, streaming has become more common as more workloads need real-time insight, but batch processing still has value for cases where large chunks of data are processed efficiently.

Which Google Cloud services fit each mode

For the Professional Cloud Database Engineer exam, it helps to know which services lean toward streaming, which lean toward batch, and which handle both.

On the streaming side, Pub/Sub is built for real-time data ingestion and delivers data as soon as it is available, which makes it a foundational component for streaming pipelines. Datastream is a serverless change data capture, or CDC, service that continuously replicates data changes from databases into Google Cloud, and it is commonly used for near real-time replication and ingestion.

On the batch side, the managed service for Apache Spark is suited to batch processing of large datasets, where data is processed in scheduled chunks. The managed service for Apache Airflow is a workflow orchestration service typically used to manage batch workloads that have specific dependencies over time.

Some services handle both modes. Dataflow is among the most versatile, built to run both real-time streams and batch workloads. It is Google's managed version of the open source Apache Beam, whose name comes from combining the words batch and stream. BigQuery also fits both categories, since it supports streaming inserts for near real-time analytics as well as batch queries over large-scale data. When a question describes a service that can do either, Dataflow and BigQuery are the ones to keep in mind.

Our Professional Cloud Database Engineer course covers batch and streaming data processing alongside change data capture with Datastream and pipeline design with Dataflow, with practice questions that drill these distinctions.

Get tips and updates from GCP Study Hub

arrow