Dataform for the PDE Exam: ELT, Assertions, Version Control

GCP Study Hub
619c7c8da6d7b95cf26f6f70
March 23, 2026

Dataform is one of those services that shows up on the Professional Data Engineer exam in a fairly predictable way. If a question mentions ELT, BigQuery, and managing SQL transformations as code, Dataform is almost always the right answer. The trick is knowing exactly what Dataform does, what it does not do, and which keywords in a question should make you reach for it instead of Cloud Dataflow or Cloud Composer.

I want to walk through the parts of Dataform that the Professional Data Engineer exam actually tests, and leave you with a clear mental model of when to pick it on a question.

What Dataform Actually Is

Dataform is a service for managing and automating SQL-based data transformations. It lives inside Google Cloud, integrates natively with BigQuery, and gives you a structured way to define tables, views, and transformation logic as code. The headline features are SQL-based transformations, version control integration, workflow automation, a collaborative environment, and seamless integration with BigQuery.

That last point is the one to internalize. On the exam, Dataform is almost always paired with BigQuery. If a question is about transforming data already sitting in BigQuery using SQL, and the architecture needs to be version-controlled and repeatable, Dataform is the service to pick.

Dataform Is ELT, Not ETL

This is the single most important distinction for exam questions. Dataform is optimized for ELT workflows, where data is loaded into the warehouse first and then transformed in place. The order matters.

  • Extract: pull data from source systems
  • Load: land the raw data into BigQuery
  • Transform: run SQL transformations inside BigQuery, orchestrated by Dataform

If a question describes a workflow where data has to be transformed before it lands in the warehouse, that is ETL, and Dataform is not the right tool. Cloud Dataflow is built for that case. So the heuristic is straightforward. ELT plus BigQuery plus SQL points to Dataform. ETL with transformations happening in-flight points to Dataflow. The exam will lean on this distinction.

SQLX Files and What You Define in Them

Dataform projects are built out of SQLX files. A SQLX file is a SQL file with a config block at the top that tells Dataform what kind of object the file produces. The three main types you should know are tables, views, and incremental tables.

A table definition looks like this:

config {
  type: "table"
}

SELECT
  user_id,
  customer_id,
  email,
  created_at
FROM ${ref("raw_users")}

The ref function is how SQLX files reference other tables in the project. Dataform uses these references to build a dependency graph, so it knows which transformations depend on which upstream tables. That graph is what lets Dataform run your transformations in the correct order without you writing orchestration code by hand.

Incremental tables are worth flagging. Instead of rebuilding the table from scratch on every run, an incremental table appends or merges only the new rows. That is the pattern you want for large event-style tables where a full rebuild would be wasteful.

Assertions for Data Quality

Assertions are the Dataform feature most likely to surface on a Professional Data Engineer question about data quality. An assertion is a check that runs as part of the transformation pipeline and fails the run if a data integrity condition is violated. You can check for uniqueness, non-null values, row counts, and custom conditions.

The canonical example is a non-null assertion:

config {
  type: "table",
  assertions: {
    nonNull: ["user_id", "customer_id", "email"]
  }
}

SELECT ...

Any row produced by this transformation where user_id, customer_id, or email is null will trip the assertion and the pipeline run will be flagged as failed. That is exactly the behavior you want for critical identifier columns. The exam framing usually sounds like "how do you automatically validate data quality during transformation" and the answer is Dataform assertions.

Version Control

Dataform integrates with Git out of the box. Your SQLX files, config, and dependency definitions all sit in a Git repository, which means every change to your transformations is reviewable, revertible, and auditable. You can branch, open pull requests, and merge changes the same way you would with application code.

On the exam, if a question asks how to bring software engineering practices like code review and version history to a SQL-based transformation pipeline, Dataform is the answer. That is one of the strongest differentiators between Dataform and writing ad-hoc scheduled queries directly in BigQuery.

Workflow Automation

Once your transformations are defined and committed, Dataform handles scheduled execution through workflow configurations. You can define release configurations and workflow configurations that run your project on a schedule, against a target BigQuery dataset, with the dependency graph respected automatically. For most BigQuery-centric ELT pipelines, this removes the need to stand up a separate orchestrator like Cloud Composer just to fire off SQL.

That said, if the question describes a broader workflow that spans multiple services beyond BigQuery, with branching logic and complex dependencies across products, Cloud Composer is still the right answer. Dataform's orchestration is scoped to its own SQL transformation graph.

Quick Exam Heuristics

  • ELT plus BigQuery plus SQL transformations: Dataform
  • ETL with in-flight transformation: Cloud Dataflow
  • Automatic data quality checks inside the pipeline: Dataform assertions
  • Version-controlled, reviewable SQL transformations: Dataform with Git
  • Cross-service orchestration across many GCP products: Cloud Composer

Keep those mappings in your head and you will catch most Dataform questions on the Professional Data Engineer exam without overthinking them.

My Professional Data Engineer course covers Dataform alongside the rest of the BigQuery transformation and orchestration tooling you need to recognize on test day, with the specific exam framing for each service.

Get tips and updates from GCP Study Hub

arrow