
One of the first conceptual splits the Professional Data Engineer exam asks you to reason about is ETL versus ELT. The acronyms differ by one letter, but the order of those letters changes which Google Cloud services you reach for, how you bill storage, and how quickly downstream teams get their hands on data. I want to walk through how I think about the two patterns when a scenario question lands in front of me, and how to pick a winner without overthinking it.
The starting point is the same for both. Source data lives in operational databases, SaaS APIs, files dropped into buckets, and application logs. The shape is inconsistent, the schemas drift, and analytics teams need it centralized in a warehouse or lake. Extract, Transform, Load and Extract, Load, Transform are two ordering choices for that centralization. They both extract from sources, they both end with usable data in a target system, and they differ on one question: do you transform the data before it lands, or after.
That single ordering decision drives almost every tradeoff you will see on the Professional Data Engineer exam, so I anchor my thinking there before I start matching services.
ETL is the traditional pattern. You pull data out of the source, run it through a transformation layer where you clean fields, apply business rules, join lookups, and aggregate where needed, and only then load the curated result into the warehouse. The warehouse never sees the raw rows. It only sees the modeled output.
This approach makes sense in a few situations:
On Google Cloud, an ETL pipeline often looks like Cloud Storage or a source database feeding Dataflow, which applies transformations, before writing curated tables into BigQuery. Dataproc serves the same role when teams prefer Spark. The transform layer is the star, and the warehouse is just the destination.
ELT flips the order. You pull data out of the source and land it, raw, into the target system. Transformation happens later, inside the warehouse or lake, using its own compute. The warehouse is no longer just a destination. It is also the transformation engine.
ELT dominates cloud-native architectures for three reasons.
First, storage is cheap. Keeping multiple copies of raw data, in different formats, for different teams, used to be a budget conversation. In the cloud it is rounding error. Marketing can have its slice, finance can have a different aggregation, and the raw layer keeps sitting there for audit and replay.
Second, ingestion is faster. The data is usable for analysts the moment it lands. You do not have to wait for a transformation job to finish before the warehouse has anything to show. Transformations happen on demand against the raw tables.
Third, modern warehouses are built for it. BigQuery is the canonical example. Its separation of storage and compute, combined with SQL-based transformation tools, makes load-first pipelines the natural choice. You ingest with Storage Transfer Service, the BigQuery Data Transfer Service, Datastream for change data capture, or Dataflow in append mode, and then you transform with scheduled queries, Dataform, or dbt running on top of BigQuery.
When a Professional Data Engineer scenario question gives you a pipeline to design, I read for a few specific signals.
The exam will not always say "use ELT" directly. It will describe a team that wants fast ingestion, multiple analytical use cases, and a BigQuery target, and you have to recognize that as an ELT pattern. Or it will describe a regulated data flow that needs masking before it touches the warehouse, and that is ETL.
You will sometimes see ETL and ELT framed as a moral choice, where ELT is "modern" and ETL is "legacy". That framing leaks into questions, but it is not how I would answer. Both are valid patterns and both come up on the exam. The right answer is the one that matches the constraints in the scenario, not the one that feels more current.
The pattern I see most often in Google Cloud architectures is a hybrid. Raw data lands in BigQuery or Cloud Storage through an ELT-style ingestion, and then Dataform or scheduled queries transform it into curated marts that look very much like the output of an ETL job. The ordering is ELT, but the final consumer-facing tables follow ETL hygiene. Knowing both patterns lets you read those hybrid designs correctly.
My Professional Data Engineer course covers ETL and ELT pipeline design on Google Cloud, including which services match each pattern and how to read scenario questions for the signal that picks the winner.