
Batch prediction is one of those Vertex AI features that shows up on the Professional Cloud Architect exam in a fairly predictable pattern. The exam describes a scenario with a large dataset, no real-time latency requirement, and asks which prediction approach to use. The answer is almost always batch prediction, and the question that follows is usually about where the input data lives and where the output goes. That is the part I want to walk through here, because the input and output options are the part of batch prediction the exam actually tests.
Batch prediction in Vertex AI is a feature you can use to process large datasets in bulk against a trained model. It is built for high-volume, non-real-time prediction tasks. You hand Vertex AI a pile of input rows, point it at a model, point it at an output destination, and let the job run. There is no deployed endpoint involved. You only pay for the compute the job consumes while it is running.
That last point is worth emphasizing because it shapes how the Professional Cloud Architect exam frames batch prediction questions. If a scenario describes a workload that runs on a schedule, processes a huge volume of records, and does not need millisecond response times, the cost-effective answer is batch prediction. Spinning up an online endpoint to process the same data would cost more because the endpoint has to be provisioned and ready to respond at all times.
Vertex AI batch prediction accepts input data from a few specific sources, and the exam expects you to recognize them. There are two formats and two locations that map to those formats.
The first option is a BigQuery table. You give the batch prediction job a fully qualified table reference and the job reads rows from that table. This fits naturally when your input data already lives in BigQuery, which is common for analytics workloads where the dataset you want to score is already the output of a query or a transformation pipeline. There is no export step. The job pulls directly from the table.
The second option is a JSONL file in Cloud Storage. JSONL stands for JSON Lines, which means a file where each line is a separate JSON object representing one prediction request. You drop the file in a Cloud Storage bucket and point the batch prediction job at the bucket path. This fits scenarios where the data is being produced by an upstream system that writes to Cloud Storage, or where you want a flat-file format that is easy to inspect and version.
You can also upload a JSONL file directly through the Vertex AI batch prediction interface, which is the same idea as the Cloud Storage option but routed through the console.
The output side mirrors the input side closely, which makes the model easy to remember. Predictions can be written to a BigQuery table or to JSONL files in Cloud Storage.
If you write output to BigQuery, the predictions land in a destination table you specify, ready to be joined against your other tables for downstream analysis. This is the right choice when the predictions are going to feed dashboards, reports, or further SQL transformations. There is no intermediate file for someone to load.
If you write output to Cloud Storage as JSONL, you get one or more JSONL files in your destination bucket. Each line contains the original input plus the prediction the model produced. This fits scenarios where the predictions are going to be consumed by an external system, archived for compliance, or handed off to a process that does not speak SQL.
The Professional Cloud Architect exam likes to test whether you can match input and output formats to a scenario rather than asking you to recall every option in the abstract. The pattern to remember is straightforward. BigQuery tables in, BigQuery tables out. JSONL in Cloud Storage in, JSONL in Cloud Storage out. You can mix the directions, meaning a job can read from BigQuery and write to Cloud Storage or read from Cloud Storage and write to BigQuery, but the format choices on each side are limited to those options.
When a question describes input and output destinations, I work through the scenario by asking two things. Where does the input data already live, and where does the output need to end up. If the answer to either is BigQuery, the BigQuery option is correct for that side. If the answer is a flat-file system or an external consumer, JSONL in Cloud Storage is correct for that side. The exam will sometimes try to bait you with options that mention CSV or Avro for batch prediction input. Those are not the supported batch prediction formats. JSONL and BigQuery tables are the answer.
It helps to anchor batch prediction to the kind of workload the exam describes. Three patterns come up consistently.
The first is customer churn prediction run on a monthly or quarterly cadence. You take your full customer base, score every customer for likelihood to churn, and feed the results to a retention team. The dataset is large, the cadence is periodic, and there is no real-time requirement. Batch prediction fits cleanly.
The second is demand forecasting run weekly across a product catalog. You score thousands of products against a forecasting model and feed the results to inventory and supply chain systems. Same pattern. Large dataset, periodic schedule, no need for millisecond responses.
The third is sales lead scoring against a CRM database. You score every lead, rank them, and hand a prioritized list to the sales team. Again, large dataset, periodic schedule, no real-time component.
If a scenario looks like one of these three patterns, the Professional Cloud Architect exam is steering you toward batch prediction.
The exam will also test whether you can recognize when batch prediction is not the right fit. The signal is real-time latency. If a scenario describes a user action that needs an immediate prediction, like a fraud check at the point of a transaction or a recommendation served on a page load, batch prediction is wrong. That is online prediction territory, where you have a deployed endpoint that responds to individual requests in milliseconds.
The other signal is small-scale ad hoc work. Batch prediction is built for high volume. If a scenario describes scoring a handful of records, the architecture overhead of a batch job is not justified.
If you want a deeper, structured walkthrough of Vertex AI batch prediction in the context of the Professional Cloud Architect exam, the Professional Cloud Architect course at GCP Study Hub covers it alongside the rest of the ML and AI material you need to know for the exam.