Lakehouse for the PDE Exam: Multi-Cloud Tables and Granular Security

GCP Study Hub
April 11, 2026

Lakehouse for Apache Iceberg (formerly BigLake) is one of those services that shows up in Professional Data Engineer questions whenever the scenario involves data sitting outside BigQuery storage but still needing BigQuery-grade governance. If you have studied external tables, you already know half of the story. Lakehouse is what you reach for when external tables are not enough.

I want to walk through what Lakehouse actually is, how it handles multi-cloud data, how the row and column security model works, and why the Professional Data Engineer exam keeps coming back to it.

What Lakehouse is and why it exists

Lakehouse is a storage engine that lets you query data sitting in object storage as if it were a native BigQuery table. The data stays as Parquet, ORC, Avro, CSV, or JSON files in Google Cloud Storage, Amazon S3, or Azure Data Lake Storage. You do not move it. You do not copy it. You define a Lakehouse table on top of it and BigQuery treats it like any other table.

The point of this service is to break down data silos without forcing a migration. A lot of organizations have years of historical Parquet files in S3 or ADLS that feed Spark jobs, Hive, Presto, and Trino. Telling those teams to copy everything into BigQuery is a non-starter. Lakehouse gives you a path where the data stays where it is and BigQuery becomes one of many engines that can read it.

That multi-engine access is the part that often catches people off guard. A Lakehouse table is not just a BigQuery construct. Through Lakehouse connectors, the same table definition works with Apache Spark, Presto, Trino, Hive, and Agent Platform (formerly Vertex AI). One copy of the data, one set of access controls, many engines.

The exam framing: Lakehouse versus BigQuery external tables

This is the comparison the Professional Data Engineer exam loves. Both Lakehouse tables and BigQuery external tables sit on top of files in object storage. The difference is what you get on top.

  • External tables give you SQL access to files, but you cannot apply row-level or column-level security. Performance is limited because metadata is not cached. Access control happens at the bucket level through IAM.
  • Lakehouse tables let you treat external data as native. You get row and column-level security, metadata caching for faster queries, and a single governance layer that works across engines.

If a scenario says "we need to query Parquet files in Cloud Storage but enforce column-level masking on the PII fields," the answer is Lakehouse, not an external table. If it says "we have analysts using Spark and BigQuery against the same files and we want consistent access policies," that is also Lakehouse.

Granular security through policy tags

The security model is where Lakehouse earns its keep on the Professional Data Engineer exam. You can apply access controls at three levels:

  • Table-level permissions through standard IAM roles.
  • Row-level security through row access policies that filter what each user sees based on a SQL predicate.
  • Column-level security through policy tags attached to specific columns.

Column-level security uses policy tags from Knowledge Catalog. You create a taxonomy, attach tags like HIGH_SENSITIVITY or PII to columns, and grant the Fine-Grained Reader role on each tag to whoever should see those columns. Anyone without that role gets an access error when they query the tagged column, even if they can read the rest of the table.

Row-level security looks more like a SQL filter:

CREATE ROW ACCESS POLICY region_filter
ON project.dataset.sales_biglake
GRANT TO ('group:emea-analysts@example.com')
FILTER USING (region = 'EMEA');

That policy means EMEA analysts only see EMEA rows, no matter what query they run. The critical part for the exam is that these policies travel with the Lakehouse table even when the underlying data lives in S3 or Azure. That is a real differentiator.

Multi-cloud through BigQuery Omni

Lakehouse is the storage layer that makes BigQuery Omni possible. When you query a Lakehouse table that points at S3, BigQuery Omni runs the compute in AWS, processes the data there, and returns the result. You do not pay egress on the bulk data because the processing happens in the same cloud as the storage. Same model for Azure.

For the exam, the pattern to remember is this: storage stays in the cloud where it lives, compute runs locally to that storage, and the user experience is the same BigQuery UI and SQL dialect everywhere. If a question describes regulatory or cost constraints that prevent moving data out of AWS or Azure, Lakehouse with BigQuery Omni is the answer.

Metadata caching and performance

Lakehouse caches table schemas, partition information, and security policies. That sounds like a small detail but it is one of the practical reasons to convert an external table into a Lakehouse table. Query planning gets faster because BigQuery does not need to re-scan object storage to figure out the schema or list partitions. For large partitioned datasets, this is the difference between a query that returns in seconds and one that spends most of its time on metadata operations.

What to memorize before the exam

If you only have time to lock in a few facts before sitting the Professional Data Engineer exam, here is what I would prioritize:

  • Lakehouse tables sit on object storage but behave like native BigQuery tables.
  • They support row-level and column-level security. External tables do not.
  • Column-level security uses Knowledge Catalog (formerly Data Catalog) policy tags and the Fine-Grained Reader role.
  • Lakehouse plus BigQuery Omni is the multi-cloud analytics story. Storage stays put, compute runs in-region.
  • Lakehouse integrates with Knowledge Catalog (formerly Data Catalog and Dataplex) for governance and metadata management.
  • Multiple engines, including Spark, Presto, Trino, and Hive, can read Lakehouse tables through connectors.

Most Lakehouse questions on the Professional Data Engineer exam reduce to two patterns. Either you need fine-grained security on data that lives outside BigQuery storage, or you need a unified query layer across clouds. Both lead to Lakehouse.

My Professional Data Engineer course covers Lakehouse alongside the rest of the storage and analytics services you need for the exam, including the comparisons with external tables, BigQuery Omni, and Knowledge Catalog (formerly Dataplex) that show up in scenario questions.

Get tips and updates from GCP Study Hub

arrow