BigQuery Table Partitioning for the PDE Exam

December 29, 2025

Table partitioning is one of those BigQuery concepts that sounds straightforward until you sit down for the Professional Data Engineer exam and realize the questions are not asking you to define partitioning. They are asking you to pick the right partitioning strategy for a specific workload, predict how it affects query cost, and reason about retention. This article walks through what I actually drill candidates on when partitioning shows up on the Professional Data Engineer blueprint.

What partitioning actually does

Partitioning splits a single BigQuery table into smaller pieces called partitions, based on a column or on ingestion time. From the user's perspective the table still looks like one table, you still query it with normal SQL, and the schema is unchanged. Under the hood, BigQuery stores each partition separately, which means a query that filters on the partitioning column only scans the partitions it needs.

That last sentence is where most exam questions live. BigQuery bills on-demand queries by bytes scanned. If a table is 4 TB and you partition it by day, a query that filters to a single day touches roughly 1/365th of the data. Same SQL, same result, dramatically lower cost and faster response. When the exam asks you to reduce query cost on a large, fast-growing table, partitioning is almost always part of the answer.

The three partitioning types you need to know

BigQuery supports three flavors of partitioning, and the Professional Data Engineer exam expects you to pick between them.

Time-unit column partitioning. You designate a DATE, DATETIME, or TIMESTAMP column in your data as the partitioning column. BigQuery places each row in the partition for that value. This is the right choice when your data carries its own event time, like a transaction_time column on a transactions table.
Ingestion-time partitioning. BigQuery partitions on the time the row was ingested, exposed through the pseudo-column _PARTITIONTIME. Use this when you do not have a reliable event-time column in the row itself, or when you genuinely care about when data landed in BigQuery rather than when the underlying event happened.
Integer range partitioning. You partition on an INT64 column by defining a start, end, and interval. This is the one people forget. It is the right answer when your natural sharding key is something like a customer ID bucket or a numeric region code, not a timestamp.

The trap on the exam is to assume every partition is a time partition. If a question describes data keyed by a numeric ID and asks for partitioning, integer range is the answer.

Choosing the partition granularity

For time partitioning, BigQuery lets you partition by hour, day, month, or year. The classic example I use is a monthly partition just to draw the concept, but in production you almost never see monthly. The workloads that benefit most from partitioning are fast-growing time-series tables, and the standard granularity for those is hour or day.

Some common time-partitioned data sources you should recognize on the Professional Data Engineer exam:

IoT sensor data streaming in from connected devices. Worth flagging that for high-write, low-latency workloads Bigtable is often a better fit than BigQuery, but when the workload lands in BigQuery, partitioning by hour or day is the move.
Transaction logs from retail or ecommerce systems, which grow continuously and are almost always queried by date range.
Clickstream data tracking user interactions on a site or app.
Sensor logs of any kind, environmental or machinery.

The thread connecting these is that they are large, fast-growing, time-series, and almost every query against them has a time filter. That is the signature of a good partitioning candidate.

Partition expirations

Partition expirations are the other piece the exam likes to test. A partition expiration tells BigQuery to automatically delete a partition once it reaches a specified age. Set a 90-day partition expiration on a daily-partitioned table and BigQuery keeps a rolling 90 days of data and drops the rest. You do not run a script, you do not file a ticket, the older partitions are gone.

The distinction the exam wants you to draw is between partition expiration and table expiration. Partition expiration deletes individual partitions and leaves the rest of the table in place. Table expiration deletes the entire table when its expiration time is reached. If a question describes a regulatory or storage-cost requirement to retain only the most recent N days of data on an ongoing basis, partition expiration is the right answer. If it describes a temporary scratch table that should disappear in two weeks, table expiration is the right answer.

What I look for on the exam

When a Professional Data Engineer question mentions a large BigQuery table, a slow or expensive query, and a filter on a date or timestamp column, partitioning should be the first thing you think about. From there, the decisions are: which partition type fits the data, what granularity fits the query pattern, and does this workload have a retention requirement that should be enforced with partition expiration rather than a custom cleanup job.

Partitioning is also frequently combined with clustering in exam scenarios. Partitioning prunes which partitions to scan based on the partition column. Clustering then orders the data within each partition by additional columns. Together they handle most cost and performance optimization questions on BigQuery tables.

My Professional Data Engineer course covers BigQuery partitioning, clustering, and the broader set of storage and query optimization patterns the exam tests.