BigQuery External Tables vs Federated Queries for the PCA Exam

GCP Study Hub
Ben Makansi
January 10, 2026

BigQuery has two ways to query data that lives outside of BigQuery without copying it in first. External tables and federated queries both solve a similar problem, but they work differently and apply to different sources. The Professional Cloud Architect exam expects you to know which one fits which scenario.

External tables

An external table is a BigQuery table definition that points to data stored somewhere else. The data stays where it is, and BigQuery reads it directly when you run a query. The most common source is Cloud Storage, but Bigtable and Google Drive are also supported.

The way I think about it is that the external table behaves just like a native BigQuery table once it's defined. You write standard BigQuery SQL against it. You join it with native tables. You apply views on top of it. The only difference is that the bytes live in Cloud Storage (or Bigtable, or Drive) instead of BigQuery's managed storage.

This is useful when you have large datasets in Cloud Storage that you don't want to duplicate. CSV, Parquet, Avro, JSON, ORC files in a bucket can be queried in place. You avoid storage costs in BigQuery and you avoid the operational overhead of building an ingestion pipeline just to run analytics.

Federated queries

Federated queries solve a different problem. They let BigQuery push a query out to another GCP database and pull the results back as a temporary table. The supported sources are Cloud SQL, Spanner, and AlloyDB.

The mechanism is the EXTERNAL_QUERY() function. You wrap a query inside it, and BigQuery sends that query to the external database, gets the results, and treats them as a temp table inside your larger BigQuery query.

Two things matter for the Professional Cloud Architect exam here. First, the data is not treated as a native BigQuery table. It comes back as a temporary result set scoped to the query you're running. Second, the inner query has to be written in the SQL dialect of the source database. If you're federating to Cloud SQL for PostgreSQL, you write PostgreSQL SQL inside EXTERNAL_QUERY(). If you're federating to Spanner, you write Spanner SQL.

How they compare

Both approaches keep the data in its original location, which is the headline similarity. But the differences are what the exam tests.

External tables point to file storage or wide-column storage. Cloud Storage, Bigtable, Drive. The external data is treated as a native-looking table, queryable with standard BigQuery SQL.

Federated queries push SQL to relational databases. Cloud SQL, Spanner, AlloyDB. The results come back as a temp table, not a persistent table object, and the inner SQL uses the source database's dialect.

If a question describes querying Parquet files in a bucket from BigQuery, that's an external table. If a question describes pulling live data from a Cloud SQL instance into a BigQuery analytics query, that's a federated query.

Why this distinction matters on the exam

The Professional Cloud Architect exam often frames data integration questions around minimizing data movement. Both of these features satisfy that constraint, so the exam will narrow it further by naming the source. Cloud Storage points you to external tables. Cloud SQL, Spanner, or AlloyDB points you to federated queries. Bigtable and Drive also point to external tables, even though they're less common scenarios.

Watch for the SQL dialect detail too. If a scenario mentions writing PostgreSQL or MySQL syntax from inside a BigQuery query, that's a tell for federated queries via EXTERNAL_QUERY().

My Professional Cloud Architect course covers BigQuery external tables and federated queries alongside the rest of the storage and analytics material.

arrow