BigQuery Dataset Configuration for the PDE Exam

619c7c8da6d7b95cf26f6f70

December 18, 2025

When I work through BigQuery with people preparing for the Professional Data Engineer exam, the dataset-level settings tend to be the part that quietly trips people up. Tables and queries get all the attention, but a BigQuery dataset is the container where you make three decisions that ripple through everything you do later: where the data physically lives, how long it sticks around, and how it gets encrypted. Each of those is a fair target for a PDE question, and each has a few subtle rules that are easy to mix up under exam pressure.

This article walks through region versus multi-region location, dataset and table expiration settings, and the Google-managed versus customer-managed encryption choice. These are the configuration decisions you make when the dataset is created, and they shape the behavior of every table that lands inside it.

Location is set at the dataset level

The first thing to internalize is that in BigQuery, location is configured at the dataset level, not the table level. Once you pick a location for a dataset, every table inside that dataset lives in the same location, and you cannot move a dataset to a different location after the fact. That single sentence is worth memorizing for the Professional Data Engineer exam because it shows up in scenarios where someone tries to join tables across locations, or copy data between datasets without realizing the locations have to be compatible.

You have two flavors to choose from:

Regional. Everything stays inside a single geographic region. You get low latency and high throughput within that region, and it is usually the cheaper option because you are not paying to replicate data across multiple locations. Regional is the right pick when you have data residency requirements that force the data to remain in a specific country, or when the workload is naturally tied to one geography.
Multi-regional. The data is spread across multiple regions inside a broader area like US or EU. This gives you high availability and better fault tolerance because the loss of one region does not take your data offline. Multi-region is the right pick when users or services across multiple regions need consistent, reliable access to the data.

On the exam, watch for keywords. Compliance, residency, and cost-sensitive in one geography all point at regional. High availability, disaster recovery, and global users all point at multi-region.

Expiration settings, dataset versus table

The second dataset setting that comes up a lot is expiration. BigQuery lets you automatically delete tables after a set period of time, which is a clean way to manage storage costs and enforce a retention policy without writing any code.

There are two levels you can configure:

Dataset-level default expiration. You set a default expiration on the dataset itself, and every new table created in that dataset inherits it. After the configured period, the table is automatically deleted. This is the lever you reach for when you want a lifecycle policy applied uniformly across everything that lands in the dataset.
Table-level expiration. You can also set an expiration on an individual table, and the table-level value supersedes the dataset default. So if your dataset default is 30 days but one specific table needs to live for 90 days, you set that table to 90 days and it overrides the inherited setting.

The PDE exam loves the override rule. If you see a question where a dataset has a default expiration and one table has a different expiration, the table-level value wins. The dataset default only applies when no table-level value has been set.

Encryption: Google-managed versus CMEK

The third dataset-level configuration is encryption. BigQuery encrypts your data at rest by default, so the question is not whether your data is encrypted but who controls the keys.

Google-managed encryption is the default. BigQuery encrypts data at rest, Google manages the keys, and Google rotates those keys regularly without you doing anything. This is the right choice when you do not have a regulatory or compliance reason to manage keys yourself.
Customer-managed encryption keys (CMEK) give you control. You create and manage the keys yourself using Cloud Key Management Service, and BigQuery uses those keys to encrypt the dataset. This is the option you reach for when compliance or regulation requires that you own the key material, control rotation, or be able to revoke access by destroying or disabling a key.

One detail that has shown up on practice questions is what happens during copies. If you configure CMEK at the dataset level, copies of data between tables inside that dataset automatically inherit the encryption settings, so you do not need to specify a key in the copy step. If CMEK is only set at the table level instead, you have to specify the key during copies. That little asymmetry is exactly the kind of detail the Professional Data Engineer exam likes to hide inside a longer scenario.

Encryption can also be set at the table level, similar to expiration. Dataset-level is the default that flows down to new tables, table-level is the override.

How to study these for the exam

The pattern across all three settings is the same: dataset-level is the default, table-level overrides. Location is the one exception, because location is fixed at the dataset level and tables inherit it without an override option. Memorize that asymmetry and a chunk of BigQuery configuration questions become easy.

When you see a scenario question, work it in this order. First, ask whether the requirement is about residency, latency, or availability, because that decides region versus multi-region. Second, ask whether retention needs to be uniform or per-table, because that decides where you set expiration. Third, ask whether compliance requires customer-owned keys, because that decides Google-managed versus CMEK. Three settings, three questions, and you have covered the most-tested part of BigQuery dataset configuration.

My Professional Data Engineer course covers BigQuery dataset configuration in depth, including the region, expiration, and encryption settings that this article walks through.

BigQuery Dataset Configuration for the PDE Exam: Region, Expiration, Encryption

Location is set at the dataset level

Expiration settings, dataset versus table

Encryption: Google-managed versus CMEK

How to study these for the exam

Get tips and updates from GCP Study Hub