
Backup questions on the Professional Data Engineer exam can feel deceptively simple. You see a scenario, somebody deleted a table or corrupted a partition, and you have to pick the right recovery mechanism out of four options that all sound vaguely reasonable. The trick is knowing which BigQuery feature exists for which purpose, because they are not interchangeable. Multi-region replication, time travel, snapshots, and exports to Cloud Storage all back up data in some sense, but each one solves a different problem.
In this post I want to walk through the four mechanisms I cover in my Professional Data Engineer course and explain how to tell them apart on exam questions.
When you create a dataset in BigQuery, you pick a location. If you pick a multi-region like US or EU, Google automatically replicates your data across multiple physical locations within that geography. A US multi-region dataset is distributed across data centers in several states. An EU multi-region dataset is spread across countries like the UK, Netherlands, Germany, and Switzerland.
This replication is designed to keep your data available during large-scale disasters. If a single data center goes offline, your data remains accessible from another location. It gives you high availability and durability for infrastructure failures.
What it does not do is protect you from yourself. If somebody runs a bad DELETE statement or drops a table by accident, that change replicates everywhere. The exam likes to test this distinction. If a question describes someone accidentally deleting data and asks how multi-region datasets help, the answer is that they do not. Multi-region replication is for disaster recovery, not user errors.
Time travel is the feature that catches most accidental changes. BigQuery continuously keeps versions of your tables going back seven days. You can query any prior state of a table during that window using a snapshot decorator, which in query syntax looks like FOR SYSTEM_TIME AS OF followed by a timestamp.
A typical query looks like this:
SELECT *
FROM `project.dataset.orders`
FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 2 HOUR)
WHERE order_id = 12345That query returns the state of the orders table as it existed two hours ago. You can use this to recover a row that got overwritten, debug a pipeline that produced wrong results, run audits, or do trend analysis comparing past states to current ones.
Two things to remember for the exam. First, time travel gives you the lowest recovery point objective of any BigQuery backup mechanism. The data loss is effectively zero because the table state is being continuously maintained. Second, the window is exactly seven days. If a question describes needing to recover a table from two weeks ago, time travel is not the answer.
Once you cross that seven-day threshold, you need BigQuery snapshots. A snapshot is a manual, on-demand backup of a table at a specific point in time. Unlike time travel, snapshots stick around as long as you want them to.
The exam scenarios where snapshots are the right answer tend to involve:
Snapshots live inside BigQuery and let you query them like regular tables. They are cheap because BigQuery only stores the delta from the source table, but they accumulate cost over time if you keep many of them.
The fourth backup option is exporting data out of BigQuery into Cloud Storage. You export tables as files in CSV, JSON, or Avro format, then store them in a GCS bucket.
The main reason to do this instead of snapshots is cost. Cloud Storage with Coldline or Archive storage classes is cheaper than keeping data in BigQuery, especially for backups you rarely touch. If a question describes a team wanting the cheapest long-term backup option for data they almost never access, exports to GCS with a cold storage class is the answer.
The tradeoff is that exported data is no longer queryable in place. You have to load it back into BigQuery or query it via external tables before you can use it again.
The last pattern is one that the course explicitly flags as exam-relevant: backing up time-partitioned tables partition by partition.
If your table is partitioned by day or week, you can export each partition separately to Cloud Storage. When an error hits, say a bad upstream feed corrupted yesterday's data, you do not need to restore or analyze the entire table. You restore just the affected partition.
This matters for two reasons. It cuts the work of recovery down to the smallest unit that contains the problem. And it makes incremental backup cheap because you only export the partitions that changed since your last backup run.
When a backup question shows up on the Professional Data Engineer exam, I run through the same mental checklist:
Getting these distinctions right means matching the mechanism to the failure mode, not just recognizing that BigQuery has backup features.
My Professional Data Engineer course covers BigQuery backup and recovery in full, including the partition-level patterns the exam likes to test.