External Tables vs Federated Queries in BigQuery for the PDE Exam

GCP Study Hub
619c7c8da6d7b95cf26f6f70
December 15, 2025

One of the more confusing pairs on the Professional Data Engineer exam is external tables versus federated queries in BigQuery. They sound interchangeable. They both let you query data without copying it into BigQuery storage. The exam will absolutely give you a scenario where one is the right answer and the other is wrong, and the trap is that students who treat them as synonyms will guess based on which name sounds better in the question. So I want to walk through the real distinction the way I cover it in the course, with the source systems each one supports, the syntax that gives it away, and the patterns that show up in PDE scenarios.

What an external table actually is

An external table in BigQuery is a table definition that points to data living somewhere else. You define the table inside BigQuery, you give it a schema (or let BigQuery autodetect one), but the actual rows live in another service. When a query runs, BigQuery reads the source data, processes the query against it, and returns results just like it would for a native table.

The supported sources for external tables are:

  • Cloud Storage (the common one, for CSV, JSON, Parquet, Avro, ORC files)
  • Bigtable
  • Google Drive (Sheets, mainly)

The mental model is that the external table is a thin BigQuery-side wrapper. From the perspective of a SQL writer, it feels native. You write standard BigQuery SQL against it. You can join it to a native table. You can use it inside a view. The data itself never gets ingested into BigQuery managed storage, which means you avoid duplication and avoid storage cost on the BigQuery side. The tradeoff is that query performance is generally worse than a native table because BigQuery has to read the source on every query and cannot apply its usual storage optimizations.

What a federated query is

A federated query is a different mechanism. Instead of defining a persistent table object, you call the EXTERNAL_QUERY function inside a SQL statement and pass it a connection plus a SQL string. BigQuery sends that SQL string out to the source database, the source database runs the query, and the results come back to BigQuery as a temporary table that the rest of your BigQuery query can join against or filter further.

The supported sources for federated queries are the operational GCP databases:

  • Cloud SQL (MySQL, PostgreSQL, SQL Server)
  • Spanner
  • AlloyDB

The two giveaways on the exam are the EXTERNAL_QUERY function and the SQL dialect. With a federated query, the inner SQL string runs on the source database, so you write it in that database's dialect. If you are querying Cloud SQL for PostgreSQL, that inner string is Postgres SQL. If you are querying Spanner, it is Spanner SQL. The outer query that wraps EXTERNAL_QUERY is normal BigQuery SQL, but the part inside the function call is not.

A simple shape looks like this:

SELECT *
FROM EXTERNAL_QUERY(
  "projects/my-project/locations/us/connections/my-cloudsql-conn",
  "SELECT customer_id, signup_date FROM customers WHERE active = true"
)

That inner string runs on the Cloud SQL instance. The result comes back to BigQuery as a temp table for the duration of the outer query.

The exam-relevant distinctions

If you remember nothing else, remember the source list. The source system in the scenario tells you which one applies:

  • If the scenario mentions Cloud Storage, Bigtable, or Google Drive, the answer involves an external table.
  • If the scenario mentions Cloud SQL, Spanner, or AlloyDB, the answer involves a federated query.

That alone resolves most PDE questions on this topic. The second distinction worth holding onto is how the external data is treated:

  • An external table is treated as a native BigQuery table from the SQL writer's perspective. You query it like any other table.
  • A federated query returns results as a temporary table. The source data is never treated as native and there is no persistent BigQuery object representing it.

The third distinction is the SQL dialect. External tables use standard BigQuery SQL because BigQuery itself is reading the underlying files. Federated queries push the inner SQL to the source database, so you write that inner string in the source's dialect.

When each one is the right architectural choice

External tables fit when you have a lot of file-based data sitting in Cloud Storage that you want to query occasionally without paying to ingest it, or when you want to use BigQuery as the SQL layer over a data lake. They also fit when business users keep operational data in Google Sheets and analysts need to join it with warehouse data.

Federated queries fit when you have transactional data living in an operational database and you want to run an analytical query that combines that operational data with what is already in BigQuery, without building an ETL pipeline to copy it over. They are also useful for one-off lookups against operational data from inside a larger BigQuery workflow.

For the Professional Data Engineer exam, scenarios that emphasize "without moving the data" or "keep data in its source system" are pointing at one of these two features. Your job is to pick the right one based on the source service named in the question.

My Professional Data Engineer course covers external tables, federated queries, and the rest of the BigQuery surface area you need to walk into the exam with confidence.

Get tips and updates from GCP Study Hub

arrow