
When I work through BigQuery with people preparing for the Professional Data Engineer exam, the dataset-level settings tend to be the part that quietly trips people up. Tables and queries get all the attention, but a BigQuery dataset is the container where you make three decisions that ripple through everything you do later: where the data physically lives, how long it sticks around, and how it gets encrypted. Each of those is a fair target for a PDE question, and each has a few subtle rules that are easy to mix up under exam pressure.
This article walks through region versus multi-region location, dataset and table expiration settings, and the Google-managed versus customer-managed encryption choice. These are the configuration decisions you make when the dataset is created, and they shape the behavior of every table that lands inside it.
The first thing to internalize is that in BigQuery, location is configured at the dataset level, not the table level. Once you pick a location for a dataset, every table inside that dataset lives in the same location, and you cannot move a dataset to a different location after the fact. That single sentence is worth memorizing for the Professional Data Engineer exam because it shows up in scenarios where someone tries to join tables across locations, or copy data between datasets without realizing the locations have to be compatible.
You have two flavors to choose from:
On the exam, watch for keywords. Compliance, residency, and cost-sensitive in one geography all point at regional. High availability, disaster recovery, and global users all point at multi-region.
The second dataset setting that comes up a lot is expiration. BigQuery lets you automatically delete tables after a set period of time, which is a clean way to manage storage costs and enforce a retention policy without writing any code.
There are two levels you can configure:
The PDE exam loves the override rule. If you see a question where a dataset has a default expiration and one table has a different expiration, the table-level value wins. The dataset default only applies when no table-level value has been set.
The third dataset-level configuration is encryption. BigQuery encrypts your data at rest by default, so the question is not whether your data is encrypted but who controls the keys.
One detail that has shown up on practice questions is what happens during copies. If you configure CMEK at the dataset level, copies of data between tables inside that dataset automatically inherit the encryption settings, so you do not need to specify a key in the copy step. If CMEK is only set at the table level instead, you have to specify the key during copies. That little asymmetry is exactly the kind of detail the Professional Data Engineer exam likes to hide inside a longer scenario.
Encryption can also be set at the table level, similar to expiration. Dataset-level is the default that flows down to new tables, table-level is the override.
The pattern across all three settings is the same: dataset-level is the default, table-level overrides. Location is the one exception, because location is fixed at the dataset level and tables inherit it without an override option. Memorize that asymmetry and a chunk of BigQuery configuration questions become easy.
When you see a scenario question, work it in this order. First, ask whether the requirement is about residency, latency, or availability, because that decides region versus multi-region. Second, ask whether retention needs to be uniform or per-table, because that decides where you set expiration. Third, ask whether compliance requires customer-owned keys, because that decides Google-managed versus CMEK. Three settings, three questions, and you have covered the most-tested part of BigQuery dataset configuration.
My Professional Data Engineer course covers BigQuery dataset configuration in depth, including the region, expiration, and encryption settings that this article walks through.