Importing Data Into Cloud SQL for the PDE Exam

619c7c8da6d7b95cf26f6f70

May 9, 2026

Cloud SQL import questions show up on the Professional Data Engineer exam in a predictable shape: you have data sitting somewhere (a file, an on-prem MySQL instance, a running production database), and you need to pick the right path to land it in Cloud SQL. The exam rewards matching the scenario to the right tool rather than reciting every option in isolation, so this article walks through the four import categories you need to recognize and the small set of operational details that tend to decide the answer.

The four import categories

Cloud SQL accepts data through four broad mechanisms, and almost every Professional Data Engineer scenario maps cleanly to one of them:

SQL dump files: a logical backup of a database, including schema and data, used for one-time restores or migrations.
CSV files: flat tabular data, usually exported from a source system or produced by an upstream pipeline.
Replication or direct transfer: ongoing synchronization from an external MySQL or PostgreSQL instance into Cloud SQL.
Database Migration Service: a Google-managed service that handles the migration end to end, including continuous replication and cutover.

If the question says "a backup file we received from a vendor" you are probably looking at a SQL dump. If it says "the source database keeps changing and we cannot afford downtime" you are looking at Database Migration Service or a Datastream-based change data capture flow. CSV is for analytics-style flat extracts. Recognizing which bucket the scenario belongs to is half the battle.

Importing SQL dumps and CSVs from Cloud Storage

The standard pattern for both SQL dumps and CSV files is the same: upload to Cloud Storage first, then import from there. Cloud SQL does not pull from arbitrary URLs or local disk, and the exam expects you to know that GCS is the staging layer. The gcloud sql import commands point at a gs:// URI and the Cloud SQL service account needs read access to that bucket.

For a SQL dump:

gcloud sql import sql my-instance gs://my-bucket/dump.sql.gz \
    --database=my_database

For CSV:

gcloud sql import csv my-instance gs://my-bucket/data.csv \
    --database=my_database \
    --table=my_table

Cloud SQL imports compressed .gz files directly, and you should take advantage of that. Compression reduces transfer cost, storage cost, and import time. If an answer choice describes uploading a multi-gigabyte uncompressed dump and a different choice describes gzipping it first, the gzip path is the one the exam expects.

SQL dump flags and format rules

The single most common Cloud SQL import gotcha on the Professional Data Engineer exam is the set of constraints on SQL dump files. A dump produced with default mysqldump options will frequently fail to import. The flags you need to recognize are:

--databases: include the CREATE DATABASE and USE statements so the dump targets the right database.
--hex-blob: encode binary columns as hex so they survive the dump-and-restore round trip.
--skip-triggers: strip triggers, which Cloud SQL imports do not allow.
--set-gtid-purged=OFF: prevent the dump from setting GTID state that Cloud SQL will reject.
--ignore-table: exclude tables you do not want to migrate.

The underlying rule is simple: SQL dump files imported into Cloud SQL cannot contain triggers, views, or stored procedures. Those objects have to be handled separately after the data load, or omitted from the dump entirely. If a question describes an import that keeps failing and the dump contains stored procedures, the fix is to strip them out and recreate them post-import.

Database Migration Service for ongoing migrations

Database Migration Service (DMS) is the right answer when the scenario involves a live source database that needs to keep serving traffic during the migration. DMS performs an initial snapshot of the source MySQL or PostgreSQL instance and then continuously replicates changes until you are ready to cut over. The destination is Cloud SQL, the service is managed, and the downtime window can be as short as the final cutover step.

If you see phrases like "minimal downtime", "continuous replication during migration", or "assess and migrate an on-prem PostgreSQL database", DMS is almost always the intended answer over a manual dump-and-import.

Datastream for change data capture

Datastream is the change data capture (CDC) service in Google Cloud. It reads the binary log on MySQL or the write-ahead log on PostgreSQL and streams row-level changes to a destination, typically BigQuery or Cloud Storage. Datastream is not primarily an into-Cloud-SQL tool. It is the answer when the scenario asks for low-latency replication of Cloud SQL data into an analytics layer, or when an operational database needs to feed a downstream system without batch ETL.

The exam often pairs Cloud SQL with Datastream as the source: production sits in Cloud SQL, Datastream captures changes, and BigQuery becomes the analytics target. Knowing the direction of flow matters.

File format considerations

A few format details round out the topic. SQL dumps need to match the destination engine, so a MySQL dump goes into Cloud SQL for MySQL and a PostgreSQL dump goes into Cloud SQL for PostgreSQL. Cross-engine moves require an export to a neutral format like CSV, or a tool like DMS that handles the translation. CSV imports need the column order and types to match the target table, and any embedded newlines or quoting issues will surface as cryptic row-level errors. Staging the file in Cloud Storage gives you a chance to inspect it before the import runs.

My Professional Data Engineer course covers Cloud SQL import paths alongside the rest of the operational database content you need for exam day, including how DMS and Datastream fit into the broader migration and replication story.