Batch vs Streaming on GCP: ACE Exam Guide

November 12, 2025

The choice between batch and streaming comes down to one question: how quickly do you need to act on the data? If the answer is immediately, you need streaming. If waiting hours or overnight is acceptable, batch works fine. That single distinction drives most of the data processing architecture decisions that appear on the Associate Cloud Engineer exam.

What Batch Processing Means

Batch processing collects data over a period of time and then processes it all at once in a scheduled job. The data is bounded, meaning you know how much there is before you start. A nightly payroll run, a daily inventory reconciliation, a weekly sales report. You gather everything that accumulated during the period, run the job, and produce the output.

The advantage of batch is efficiency. Processing large volumes of data in one pass is often cheaper and simpler than processing each record as it arrives. The trade-off is latency. If your batch job runs at midnight, the insights from that data are not available until morning. For reporting and analytics that do not require real-time decisions, that delay is acceptable.

Common batch use cases include financial reporting, customer billing, inventory management, and overnight ETL pipelines that load data into a data warehouse for next-day analysis.

What Streaming Means

Streaming processing handles data continuously as it arrives, in real-time or near real-time. The data is unbounded, meaning it keeps arriving and there is no defined endpoint. A stream of IoT sensor readings, a feed of user click events, a flow of financial transactions.

The advantage of streaming is latency. You can detect a fraudulent transaction within seconds of it occurring, trigger an alert the moment a server metric crosses a threshold, or update a live dashboard as new events arrive. The trade-off is complexity and cost. Streaming infrastructure requires careful design around event ordering, late-arriving data, and stateful processing.

Common streaming use cases include fraud detection, IoT sensor monitoring, real-time dashboards, system alerting, and log analysis.

The Bounded and Unbounded Distinction

The Associate Cloud Engineer exam uses the terms bounded and unbounded data. Bounded data has a defined start and end. Unbounded data keeps growing with no defined end. Batch processing naturally fits bounded data. Streaming naturally fits unbounded data.

One reason this distinction matters is that the same processing framework can handle both. Google Cloud Dataflow, which is GCP's managed version of Apache Beam, processes both batch and streaming jobs. The name Apache Beam comes from combining Batch and strEAM, which tells you the framework was designed with both in mind from the start.

GCP Services for Batch and Streaming

For streaming, the most common pattern on GCP is Pub/Sub combined with Dataflow. Pub/Sub is a managed messaging service that ingests real-time event streams. Dataflow reads from Pub/Sub, applies transformations, windowing, and aggregations, and writes results to a destination like BigQuery or Cloud Storage. This combination appears frequently in exam scenarios involving real-time data processing.

Dataflow handles both batch and streaming. When you write a Dataflow pipeline, you write it once and the same code can process a bounded dataset from Cloud Storage in batch mode or an unbounded stream from Pub/Sub in streaming mode. This unified model is one of Dataflow's strengths.

Cloud Managed Service for Apache Spark (formerly Dataproc) is GCP's managed Hadoop and Spark service. It is primarily used for batch processing workloads, particularly when you are migrating existing Spark or Hadoop jobs from on-prem to Google Cloud. Managed Spark does support Spark Streaming for streaming use cases, but Pub/Sub plus Dataflow is typically the recommended streaming pattern on GCP.

Hybrid Approaches

Some architectures use both batch and streaming together. A streaming pipeline might process events in real-time and write enriched records to Cloud Storage, while a separate batch job runs nightly to reprocess the full dataset for analytical accuracy. This is sometimes called a lambda architecture or kappa architecture depending on how the two pipelines relate.

The ACE exam occasionally presents scenarios where neither pure batch nor pure streaming is obviously correct, and you need to recognize that a hybrid design serves both the real-time alerting requirement and the daily reporting requirement.

How This Shows Up on the Exam

Exam questions about batch versus streaming typically describe a scenario and ask you to identify the right processing approach or the right GCP service. Key signals in the question text tell you which applies.

Words like "real-time," "immediate," "as it arrives," "low latency," and "continuous" point toward streaming. Words like "daily," "nightly," "scheduled," "at the end of the month," and "large volume at once" point toward batch. When you see Pub/Sub in a streaming scenario, Dataflow is almost always the next piece of the pipeline.

For a complete walkthrough of how these patterns appear in actual exam questions, including the tricky hybrid scenarios, I cover this thoroughly in my Associate Cloud Engineer course.

Making the Call in Practice

Most data engineering decisions in the real world involve evaluating the business requirement first, then working backward to the technical implementation. A fraud detection system that cannot alert until the next morning provides almost no value. A monthly financial report that processes in real-time adds cost without adding benefit. The latency requirement is the deciding factor, and the ACE exam scenarios always give you enough context to identify it.

Cost is a secondary consideration. Streaming infrastructure is more complex and often more expensive than batch because it requires always-on processing capacity. A Dataflow streaming job runs continuously. A Dataflow batch job runs to completion and stops. For high-volume, latency-tolerant workloads, batch is frequently the cheaper option by a substantial margin.

When the exam gives you a scenario without a clear latency signal, look for clues in the data volume and the consequence of delay. Billions of events per day with no time-sensitive action on any individual event leans toward batch. A few million events per hour where each event might trigger an automated action leans toward streaming.

Batch vs Streaming on Google Cloud: ACE Exam Essentials