BigQuery DDL Operations for the PDE Exam

GCP Study Hub
619c7c8da6d7b95cf26f6f70
January 1, 2026

BigQuery questions on the Professional Data Engineer exam tend to cluster around two areas. The first is performance optimization, things like partitioning and clustering. The second is the actual SQL you need to write to manage tables, schemas, and views. That second area is governed by DDL, and it shows up enough on the exam that it deserves its own walkthrough.

DDL stands for Data Definition Language. It is the part of SQL that defines, modifies, and removes the structure of database objects. When I think about DDL, I picture the architecture of a database rather than the rows of data inside it. Tables, schemas, indices, views, those are all DDL territory. Compare that to DML, Data Manipulation Language, which handles the actual data sitting inside those structures. DML is for INSERT, UPDATE, DELETE, and MERGE. DDL is for CREATE, ALTER, DROP, and TRUNCATE. Knowing which family a command belongs to is the first thing I check whenever I see a BigQuery SQL question on a practice exam.

Why DDL matters in BigQuery

BigQuery is organized in a clear hierarchy. You start with a project, which contains datasets, which contain tables and views. DDL is how you create and manage every layer below the project. If you want a new table inside a dataset, you write DDL. If you want to add a column to that table six months later because the data model evolved, you write DDL. If you want to wipe the table clean without recreating it, you write DDL. Every structural change to a BigQuery object goes through a DDL statement.

One thing I want to flag for anyone studying for the Professional Data Engineer exam. You can do many of these operations through the Cloud Console or the bq command line tool. The exam still expects you to recognize the SQL syntax. If you see a CREATE TABLE statement on a multiple choice question, you need to read it and know what it does without leaning on a UI.

The four DDL commands to know

There are four DDL commands that come up over and over.

  • CREATE builds new database objects. Tables, views, indexes, materialized views, external tables. If a structure does not exist yet, CREATE brings it into existence.
  • ALTER modifies an existing object. The classic example is adding a new column to a table when your schema needs to grow. ALTER is also how you change table options like expiration time or descriptions.
  • DROP deletes an object outright. The table, view, or index is gone, along with its structure. Use it when you are done with something for good.
  • TRUNCATE sits in the middle. It removes all the rows from a table but keeps the table structure intact. The schema stays, the data leaves. I reach for TRUNCATE when I want to reuse a table without rebuilding it.

Here is the canonical example of CREATE TABLE in BigQuery, the kind of statement you should be able to read fluently.

CREATE TABLE human_resources.employees (
  id INT64,
  name STRING
);

This creates a table called employees in the human_resources dataset. It has two columns, an integer id and a string name. Nothing fancy, but the structure shows up everywhere. Notice that BigQuery uses INT64 and STRING rather than INT and VARCHAR. That is one of those small details the Professional Data Engineer exam likes to test, because candidates who only know standard ANSI SQL sometimes miss it.

Partitioning through DDL

Partitioning is one of the biggest performance levers in BigQuery, and you set it up through DDL at table creation time. The exam expects you to know not just when to partition, but what the syntax looks like. Here is the pattern.

CREATE TABLE logs.log_events
(
  id INT64,
  event_name STRING,
  event_timestamp TIMESTAMP
)
PARTITION BY DATE(event_timestamp);

The PARTITION BY clause does the work. In this case I am partitioning the log_events table on the date extracted from event_timestamp. Each calendar day becomes its own partition under the hood. Future queries that filter on event_timestamp will only scan the partitions they actually need, which means less data scanned and lower cost. That cost angle is exactly why partitioning shows up in so many exam scenarios.

A few things worth remembering about partitioning syntax. You can partition by a DATE or TIMESTAMP column, by an INT64 column with a range specification, or by ingestion time using _PARTITIONDATE or _PARTITIONTIME. You define partitioning when the table is created. Adding a partition strategy to an existing unpartitioned table is not something you do with a simple ALTER. You typically recreate the table and copy data over.

How DDL questions show up on the exam

When the Professional Data Engineer exam tests DDL, it usually does it in one of three ways. The first is a syntax question that shows you a CREATE TABLE statement and asks what it does or whether it is valid. The second is a scenario where you need to pick the right command, and the answer hinges on whether you want to keep the structure (TRUNCATE) or wipe it entirely (DROP). The third is a partitioning or clustering question that puts a CREATE TABLE statement in front of you and asks why the query cost dropped or why a filter is efficient.

If you can read a CREATE TABLE statement with PARTITION BY and explain what BigQuery is doing behind the scenes, you are in good shape for most of these.

My Professional Data Engineer course covers BigQuery DDL operations alongside the partitioning, clustering, and query optimization topics that go with them.

Get tips and updates from GCP Study Hub

arrow