Datastream: Change Data Capture and Serverless Replication

GCP Study Hub
May 10, 2026

Datastream is a fully managed, serverless change data capture and data replication service on Google Cloud. Change data capture, or CDC, means it watches a source database for changes and applies those changes to a destination as they happen, rather than copying the whole dataset on a schedule. The Professional Cloud Database Engineer exam tends to test two things about it. The first is what kind of problem it is meant to solve. The second is where it can actually send data, because that detail is narrower than people expect and several wrong answers are built on getting it wrong.

What Datastream is for

Datastream focuses on real-time data replication between a source and a destination. It is fully managed and serverless, so there is no infrastructure for you to provision or scale. The case it fits best is an organization that has a database somewhere and needs the data to show up in Google Cloud continuously, with changes flowing through rather than arriving in nightly batches.

A common pattern is replicating an on-premises database into Google Cloud for analytics and reporting. If you have an on-prem Oracle database and you want that data available in Google Cloud, Datastream is a strong fit. The typical example is replicating an on-prem Oracle database to BigQuery, where the operational data becomes available for analytics and reporting without the analytics workload touching the source system. Datastream captures any changes in the source database and applies them to the destination in real time, so the copy stays current as the source changes.

The two supported destinations

This is the part the exam leans on most. Datastream supports two destination services, BigQuery and Cloud Storage. That is the full list. BigQuery is the destination when the goal is analytics and reporting on the replicated data. Cloud Storage is the destination when you want the change stream landed as files, which you can then process or load with other tools.

Because the list is just those two, an answer choice that has Datastream replicating into something else is usually there to be eliminated. If a scenario describes using Datastream to replicate data into Cloud SQL, for example, that would be an incorrect answer, because Cloud SQL is not a supported Datastream destination. We would treat the presence of a non-supported destination as a quick way to rule a choice out. When you see Datastream in a question, check the destination first, and if it is not BigQuery or Cloud Storage, the choice is wrong regardless of how reasonable the rest of it sounds.

How it tends to come up on the exam

Questions in this area usually present a situation, a source database that needs to reach Google Cloud with low latency, and ask you to pick the right service or the right configuration. Datastream is the answer when the requirement is ongoing change data capture and replication, especially from an on-prem source into BigQuery or Cloud Storage. When the requirement points somewhere outside those destinations, Datastream stops being the answer. Knowing that it is serverless and fully managed also helps, since requirements that emphasize avoiding infrastructure management line up with it.

For the Professional Cloud Database Engineer exam, the reliable approach is to keep two facts straight. Datastream does serverless change data capture and replication, and it writes to BigQuery and Cloud Storage only. Most of the distinctions the questions ask you to make come down to those two points.

Our Professional Cloud Database Engineer course covers Datastream alongside BigQuery and on-prem to cloud migration patterns, with practice questions that drill these distinctions.

Get tips and updates from GCP Study Hub

arrow