Data Movement on Google Cloud for the Professional Cloud Database Engineer Exam

GCP Study Hub
June 28, 2026

Data movement on Google Cloud covers the set of services that get data from one place to another, whether that is migrating a database into a managed service, streaming changes from an operational database into an analytics system, or moving data through a processing pipeline. The Professional Cloud Database Engineer exam tends to test these services against each other, because several of them overlap in what they can technically do while differing in what they are actually built for. Most of the questions come down to matching the requirement, the source, and the destination to the one tool that fits, and setting aside options that would also work but at higher cost, more latency, or more operational overhead.

Database Migration Service

Database Migration Service, or DMS, is the primary tool for moving existing database workloads into Google Cloud. It is serverless, and it transfers both data and metadata from a source to a destination while preserving the integrity of schemas and records. It supports migrations into Cloud SQL and AlloyDB for PostgreSQL. The service automates the initial data snapshot to get the migration started and then manages continuous replication to keep new changes in sync, which is what allows a cutover with minimal downtime.

Two pairs of terms describe any DMS migration, and a given migration is classified along both at once. The first pair is continuous versus one-time. A continuous migration, also called an online migration, performs the initial full dump and load and then keeps synchronizing changes from the source until you are ready to switch the application over. Because the source stays live until the final moment, downtime is minimal, and at the end the service promotes the destination to a standalone primary. A one-time migration takes a single point-in-time snapshot and loads a fixed dump. It requires stopping writes on the source before the snapshot so the data stays consistent, which means the database is effectively frozen and downtime is longer.

The second pair is homogeneous versus heterogeneous. A homogeneous migration keeps the same database engine on both ends, such as on-premises MySQL into Cloud SQL for MySQL, which keeps schema mapping straightforward. A heterogeneous migration connects different engines, such as Oracle into PostgreSQL, and that requires converting schema and code so the destination can run the original logic. For those conversions, DMS provides conversion workspaces, and it integrates Gemini to help translate complex objects between dialects. When a scenario involves refactoring stored procedures or triggers while staying inside Google Cloud, the integrated conversion workspace is generally the intended answer rather than manual rewrites or external tooling.

Datastream

Datastream is a fully managed, serverless change data capture and replication service. It captures changes from a source database and applies them to a destination in near real time, which makes it the usual fit when an on-premises or operational database needs to feed analytics with low latency. A common pattern is replicating an Oracle, MySQL, or PostgreSQL source into Google Cloud for reporting.

The detail worth holding onto for the exam is the set of supported destinations. Datastream writes to BigQuery and Cloud Storage. It does not replicate into Cloud SQL, so an answer choice that proposes Datastream as a path into Cloud SQL is incorrect. That single constraint separates Datastream from DMS in many scenarios: when the goal is migrating into a managed relational database, DMS is the tool, and when the goal is streaming changes into an analytics destination, Datastream is the tool.

Dataflow

Dataflow is Google's managed version of Apache Beam, and the name Beam comes from combining batch and stream. That captures what the service does. It handles both batch and streaming workloads in a single pipeline, which removes the older problem of maintaining one pipeline for fast data and a separate one for accurate historical data. Dataflow is autoscaling and serverless, and it integrates natively with Cloud Storage, Pub/Sub, and BigQuery, with connectors available for Bigtable and Apache Kafka.

Dataflow is suited to heavy data pipelines and transformations. For small or straightforward jobs, such as a simple scheduled export, Cloud Run Functions are usually the better choice, and pulling in Dataflow there adds cost and complexity for no benefit. Dataflow also appears in migration scenarios where purpose-built connectors can replicate between systems with low operational overhead.

Managed Service for Apache Spark, Data Fusion, and Pub/Sub

Managed Service for Apache Spark, formerly called Dataproc, is the managed, on-demand version of Apache Hadoop and Apache Spark. It is the right choice for workloads that already have dependencies in the Hadoop or Spark ecosystem. On the Professional Cloud Database Engineer exam, though, Dataflow is frequently the better answer over this service because Dataflow is serverless and supports streaming, so unless a question specifically points to existing Spark or Hadoop dependencies, the serverless option tends to win.

Cloud Data Fusion is a fully managed data integration service with a point-and-click interface, effectively a no-code ETL and ELT tool. It can connect to other clouds, SaaS products, and on-premises systems, and it helps integrate data into lakes such as Cloud Storage and warehouses such as BigQuery. It fits teams that want to build pipelines visually rather than write code. For low-latency change capture into BigQuery, a dedicated stream is generally preferred, because Data Fusion's scheduled or micro-batch approach introduces more latency than a continuous capture path.

Pub/Sub is a global-scale messaging service that acts as a buffer between senders and receivers. By placing a queue between them, it converts a tightly coupled system, where a slow or failed receiver can break the whole flow, into a loosely coupled one that is more fault tolerant and scalable. It is commonly used for data ingestion and frequently feeds a pipeline service like Dataflow. A recurring pattern worth recognizing is Cloud Scheduler publishing a message to a Pub/Sub topic that triggers a Cloud Run function, a fully managed way to run scheduled work without keeping a virtual machine running.

Choosing the right tool

The way to read these questions is to anchor on the source, the destination, and the constraint. Migrating a database into Cloud SQL or AlloyDB points to DMS, and the continuous versus one-time and homogeneous versus heterogeneous labels then tell you which mode and whether a conversion workspace is involved. Streaming changes from an operational database into BigQuery or Cloud Storage points to Datastream. Heavy batch and streaming transformations point to Dataflow, while existing Spark or Hadoop dependencies point to Managed Service for Apache Spark. Visual, no-code integration across many sources points to Data Fusion, and decoupling producers from consumers or buffering ingestion points to Pub/Sub. When more than one service could move the data, the requirement around latency, downtime, cost, and operational effort is usually what selects the correct one.

Our Professional Cloud Database Engineer course covers data movement alongside Database Migration Service migration types and change data capture with Datastream, with practice questions that drill these distinctions.

Get tips and updates from GCP Study Hub

arrow