
Lakehouse for Apache Iceberg (formerly BigLake) is one of those services that shows up in Professional Data Engineer questions whenever the scenario involves data sitting outside BigQuery storage but still needing BigQuery-grade governance. If you have studied external tables, you already know half of the story. Lakehouse is what you reach for when external tables are not enough.
I want to walk through what Lakehouse actually is, how it handles multi-cloud data, how the row and column security model works, and why the Professional Data Engineer exam keeps coming back to it.
Lakehouse is a storage engine that lets you query data sitting in object storage as if it were a native BigQuery table. The data stays as Parquet, ORC, Avro, CSV, or JSON files in Google Cloud Storage, Amazon S3, or Azure Data Lake Storage. You do not move it. You do not copy it. You define a Lakehouse table on top of it and BigQuery treats it like any other table.
The point of this service is to break down data silos without forcing a migration. A lot of organizations have years of historical Parquet files in S3 or ADLS that feed Spark jobs, Hive, Presto, and Trino. Telling those teams to copy everything into BigQuery is a non-starter. Lakehouse gives you a path where the data stays where it is and BigQuery becomes one of many engines that can read it.
That multi-engine access is the part that often catches people off guard. A Lakehouse table is not just a BigQuery construct. Through Lakehouse connectors, the same table definition works with Apache Spark, Presto, Trino, Hive, and Agent Platform (formerly Vertex AI). One copy of the data, one set of access controls, many engines.
This is the comparison the Professional Data Engineer exam loves. Both Lakehouse tables and BigQuery external tables sit on top of files in object storage. The difference is what you get on top.
If a scenario says "we need to query Parquet files in Cloud Storage but enforce column-level masking on the PII fields," the answer is Lakehouse, not an external table. If it says "we have analysts using Spark and BigQuery against the same files and we want consistent access policies," that is also Lakehouse.
The security model is where Lakehouse earns its keep on the Professional Data Engineer exam. You can apply access controls at three levels:
Column-level security uses policy tags from Knowledge Catalog. You create a taxonomy, attach tags like HIGH_SENSITIVITY or PII to columns, and grant the Fine-Grained Reader role on each tag to whoever should see those columns. Anyone without that role gets an access error when they query the tagged column, even if they can read the rest of the table.
Row-level security looks more like a SQL filter:
CREATE ROW ACCESS POLICY region_filter
ON project.dataset.sales_biglake
GRANT TO ('group:emea-analysts@example.com')
FILTER USING (region = 'EMEA');That policy means EMEA analysts only see EMEA rows, no matter what query they run. The critical part for the exam is that these policies travel with the Lakehouse table even when the underlying data lives in S3 or Azure. That is a real differentiator.
Lakehouse is the storage layer that makes BigQuery Omni possible. When you query a Lakehouse table that points at S3, BigQuery Omni runs the compute in AWS, processes the data there, and returns the result. You do not pay egress on the bulk data because the processing happens in the same cloud as the storage. Same model for Azure.
For the exam, the pattern to remember is this: storage stays in the cloud where it lives, compute runs locally to that storage, and the user experience is the same BigQuery UI and SQL dialect everywhere. If a question describes regulatory or cost constraints that prevent moving data out of AWS or Azure, Lakehouse with BigQuery Omni is the answer.
Lakehouse caches table schemas, partition information, and security policies. That sounds like a small detail but it is one of the practical reasons to convert an external table into a Lakehouse table. Query planning gets faster because BigQuery does not need to re-scan object storage to figure out the schema or list partitions. For large partitioned datasets, this is the difference between a query that returns in seconds and one that spends most of its time on metadata operations.
If you only have time to lock in a few facts before sitting the Professional Data Engineer exam, here is what I would prioritize:
Most Lakehouse questions on the Professional Data Engineer exam reduce to two patterns. Either you need fine-grained security on data that lives outside BigQuery storage, or you need a unified query layer across clouds. Both lead to Lakehouse.
My Professional Data Engineer course covers Lakehouse alongside the rest of the storage and analytics services you need for the exam, including the comparisons with external tables, BigQuery Omni, and Knowledge Catalog (formerly Dataplex) that show up in scenario questions.