External tables are one of the more useful features in BigQuery, and the Associate Cloud Engineer exam tests a specific framing of them. This article covers what external tables are, when they make sense, the performance trade-off versus loading data natively, and the question patterns to watch for.
It does not cover the full set of supported source formats in exhaustive detail, partitioning strategies for external data, or the BigLake variant that adds fine-grained access control on top. Those are useful in a real job but are not what the ACE exam tests.
A regular BigQuery table stores its data inside BigQuery's managed storage. You load data in, BigQuery owns the bytes from that point forward, and queries run against that storage.
An external table is different. The table definition lives in BigQuery, but the actual data sits somewhere else. Most often, that somewhere else is Cloud Storage. You point BigQuery at a set of files, define the schema, and BigQuery treats it as a queryable table. When you run a query, BigQuery reads the data straight out of Cloud Storage, processes it, and gives you the results.
The data is never copied into BigQuery. There is no load step. Update the underlying file in Cloud Storage and the next query sees the new data.
The biggest reason is avoiding a duplicate copy of data. If you already have CSV files, JSON, Avro, or Parquet sitting in a Cloud Storage bucket and you want to query them with SQL, an external table lets you do it without setting up an ingestion pipeline.
It also gives you near real-time access. The moment a file lands in the bucket, it is queryable through the external table. There is no load delay.
And it is flexible on format. CSV, JSON, Avro, Parquet, ORC. BigQuery handles them all as external sources without you having to transform or normalize first.
This is the part to rank loudly. External tables are slower than native BigQuery tables. That matters more than most of the other details.
BigQuery's native storage is columnar, compressed, and tightly integrated with the query engine. When you query a native table, BigQuery reads only the columns you reference and uses internal optimizations that depend on owning the storage layout. When you query an external table backed by Cloud Storage, BigQuery has to reach across the network to read the files, parse them, and process them. That overhead shows up as longer query times.
For occasional queries against a dataset, that trade-off is fine. For dashboards that hit the same data hundreds of times a day, it is not. Load that data into a native table.
"Federated query" is the broader term for querying data outside BigQuery. External tables on Cloud Storage are the most common case, but BigQuery can also federate queries against Cloud SQL, Spanner, and Bigtable. The same trade-off applies. Convenient access, slower performance.
The Associate Cloud Engineer exam pattern is consistent. A scenario describes a team that has data in Cloud Storage and wants to run SQL queries on it. The question asks for the simplest or fastest-to-set-up approach.
If the question emphasizes "without copying the data" or "without loading", external tables are the answer. The wrong answer is usually setting up a Dataflow pipeline or a load job, both of which are heavier than needed.
The flip side. If a scenario describes a team running heavy analytical workloads on the same data repeatedly and complaining about query speed, the answer is to load the data into a native BigQuery table, not stick with the external table. The exam wants you to recognize when external is the right call and when it is not.
If you see "data in Cloud Storage" and "query with SQL" together, think external table. If you see "slow queries" or "high query volume" on top of that, think load it natively.
You can create an external table from the BigQuery console or the bq CLI. The CLI version:
bq mk --external_table_definition=schema.json@CSV=gs://my-bucket/data/*.csv \
my_dataset.my_external_table
The schema can be inferred or specified explicitly. The wildcard in the URI lets the table cover many files at once, which is the typical pattern for log data or daily exports.
External tables let BigQuery query data in Cloud Storage as if it were a regular table, with no load step. The trade-off is slower query performance because the data is read across the network at query time. They are the right answer when you want flexible, near real-time SQL access without duplicating data, and the wrong answer when the same dataset is being queried heavily and would benefit from being loaded natively.
My Associate Cloud Engineer course covers external tables in the BigQuery section alongside the storage and pricing model that frames why this trade-off exists.