BigQuery Federated Queries: Reading External Databases in Place

GCP Study Hub
June 3, 2026

A BigQuery federated query lets you query another Google Cloud database from BigQuery and retrieve the results as a temporary table, without first copying that data into BigQuery. It works against Cloud SQL, AlloyDB, and Spanner. The original data stays in its source database, and BigQuery reaches out to read it at query time. This is useful when you want to run analytics on data that lives across several Google Cloud services while keeping each dataset in its original place. On the Professional Cloud Database Engineer exam, federated queries tend to come up as the answer when a scenario asks you to analyze operational data from BigQuery but does not want, or does not have time for, a pipeline that moves that data first.

How a federated query works

You run a federated query with the EXTERNAL_QUERY function. EXTERNAL_QUERY tells BigQuery to send a query to the external database, run it there, and return the rows back to BigQuery as a temporary result you can then join, filter, or aggregate alongside native BigQuery data. Because BigQuery is reading from the source rather than from a stored copy, there is nothing to keep in sync and no separate load step to maintain.

One detail that matters here is the SQL dialect. The query you pass inside EXTERNAL_QUERY has to use the external database's SQL dialect, not BigQuery's. The portion of the query that runs against Cloud SQL, AlloyDB, or Spanner is written in that source system's dialect, and BigQuery executes it remotely as written. Mixing up which dialect applies to which part of the statement is the kind of distinction the exam likes to test.

SELECT *
FROM EXTERNAL_QUERY(
  'connection_id',
  'SELECT customer_id, status FROM customers WHERE status = 1'
);

Performance and cost characteristics

Federated queries are typically slower than queries against native BigQuery tables. BigQuery is reaching out to a separate database and waiting on it to run the query and return rows, so a federated query does not perform like one that scans data already stored in BigQuery's own columnar format. That is a reasonable trade for skipping data movement, but it is worth keeping in mind when a workload is latency sensitive or runs frequently.

The cost advantage comes from what you avoid. Because the data is never duplicated into BigQuery, you do not pay to store a second copy, and you do not have to build and operate a data pipeline to keep that copy current. Eliminating data duplication and the deployment of pipelines is the cost-effective part of the feature, and it is often the reason a federated query is the intended answer over an approach that extracts and loads the data first.

What to carry into the exam

The points worth holding onto are straightforward. Federated queries read from Cloud SQL, AlloyDB, and Spanner without moving the data. They use the EXTERNAL_QUERY function, the embedded query uses the source database's SQL dialect, and the results come back as a temporary table in BigQuery. They are generally slower than native BigQuery queries but avoid the cost and maintenance of duplicating data and running pipelines. When a Professional Cloud Database Engineer scenario describes analyzing operational data from BigQuery while leaving it in place, federated queries are usually what the question is pointing toward.

Our Professional Cloud Database Engineer course covers BigQuery federated queries alongside Cloud SQL and Spanner, with practice questions that drill these distinctions.

Get tips and updates from GCP Study Hub

arrow