BigQuery DML Operations and Quotas for the PDE Exam

GCP Study Hub
619c7c8da6d7b95cf26f6f70
January 3, 2026

DML is one of those topics that sounds basic until you sit down for the Professional Data Engineer exam and realize the questions are not testing whether you can write an INSERT. They are testing whether you know when INSERT is the wrong choice, when UPDATE will quietly burn through a quota you cannot raise, and when MERGE is the only sane answer. I want to walk through how I frame this section for candidates studying for the Professional Data Engineer certification, because the framing matters more than memorizing the syntax.

What DML actually is

DML stands for Data Manipulation Language. It is the part of SQL that deals with the data inside tables, as opposed to DDL, or Data Definition Language, which deals with the structure of the tables themselves. DDL creates and alters tables and schemas. DML inserts, updates, deletes, and merges the rows that live inside those tables.

That distinction matters on the exam because question stems will sometimes describe a workflow and you have to pick the right type of operation. If the prompt is about changing a column type or adding a partition, that is DDL. If the prompt is about modifying actual rows of data, that is DML, and the quota and efficiency tradeoffs come into play.

The four DML commands you need to know

BigQuery supports the standard set of DML commands. The exam expects you to know what each one does and when to reach for it.

  • INSERT adds new rows to a table. A typical statement looks like INSERT INTO employees (id, name) VALUES (1, 'John Doe').
  • UPDATE modifies existing rows. It has to locate the rows that match a predicate and rewrite them.
  • DELETE removes rows from a table. You can scope it with a WHERE clause or clear the whole table.
  • MERGE is the set-based operation. It compares a source to a target and inserts, updates, or deletes rows in a single statement based on conditions you define.

INSERT versus UPDATE on the exam

This is the comparison that shows up most often. Both can technically solve the same problem, and the exam will hand you a scenario where either works but only one is the best answer.

The classic example is logging changes to data. Imagine you are tracking edits to customer records over time. You could write an UPDATE that overwrites the existing row every time the customer's data changes, or you could write an INSERT that appends a new row with a timestamp every time something changes. Both produce a working system. Only one is the right answer for the exam.

INSERT wins for continuous logging. It is a simpler operation because BigQuery does not have to search for and modify existing rows. It just appends. That makes it the natural choice for time-series data, audit logs, and anything where you want to preserve history.

UPDATE is the wrong choice for time-series specifically because it overwrites historical values. Once the original entry is gone, it is gone. And on top of that, UPDATE is less efficient when applied frequently to tables that grow quickly. It has to scan, locate, and rewrite, which is computationally more expensive than appending.

The mental shortcut I give candidates is this. If the prompt mentions logging, history, time-series, or high-frequency writes, lean toward INSERT. If the prompt mentions correcting bad data or reconciling a small batch of changes, UPDATE is fine.

DML quotas and the quotaExceeded error

Here is where BigQuery diverges from the relational databases candidates often come in with. BigQuery enforces quotas on DML operations. If you run too many DML statements in a day, for example updating millions of records one batch at a time, you can hit a quotaExceeded error.

The detail that catches people off guard is that you cannot increase this quota. Many GCP quotas can be raised by filing a request. The DML quota cannot. So if a Professional Data Engineer exam question describes a team that keeps hitting quotaExceeded on BigQuery DML operations and asks what to do, the answer is not to ask for a quota increase.

Why MERGE is the answer to DML quota questions

The recommended workaround is MERGE. The reason is structural. UPDATE processes changes on a record-by-record basis, which means each operation counts against the quota. MERGE processes changes all at once as a set-based operation, which means a single MERGE statement handles what might otherwise be thousands of individual UPDATE statements.

The workflow looks like this. You transfer the data you want to apply into a new staging table, then you run a MERGE that compares that staging table to your target table and applies the inserts, updates, and deletes in one shot. That collapses the operation count and stays under the quota, and it tends to be more efficient overall because BigQuery can plan the work as one job instead of many.

If you see an exam question that combines two signals, large daily DML volume and the quotaExceeded error, the answer almost always involves staging the changes and using MERGE.

What to lock in before exam day

For the Professional Data Engineer exam, the DML topic comes down to a few clean rules. DML is for data, DDL is for structure. INSERT is preferred over UPDATE for logging and time-series. BigQuery has a hard DML quota that cannot be raised. MERGE is the escape hatch when you are pushing high volumes of changes, and it requires staging the data in a new table first.

If you can apply those rules to a scenario question without rereading the choices three times, you are ready for this section.

My Professional Data Engineer course covers BigQuery DML operations and quotas with the exam framing you need to pick the right answer under time pressure.

Get tips and updates from GCP Study Hub

arrow