BigQuery Overview and Resource Hierarchy for the PCA Exam

Ben Makansi

November 6, 2025

What BigQuery actually is

BigQuery is GCP's fully managed, serverless relational database, and it is the data warehouse on Google Cloud. For the Professional Cloud Architect exam, that one sentence carries a lot of weight, so I want to unpack it.

Fully managed and serverless means you do not provision a cluster, pick instance sizes, patch nodes, or worry about underlying VMs. You create a project, you create a dataset, you load data, and you query. Storage and compute both auto-scale, which matters in architectural diagrams because BigQuery is the layer you reach for when the workload size is unknown or expected to grow into terabytes and petabytes.

It handles both batch and streaming loads. That detail comes up on the PCA exam any time a question describes a pipeline where Pub/Sub or Dataflow is feeding analytics. BigQuery is comfortable as the sink in either pattern.

It is a relational database that uses SQL, so analysts already familiar with SQL are productive on day one. And it is excellent at both storage and analysis, which is why I describe it as GCP's flagship data product. Some teams adopt GCP specifically to use BigQuery and integrate it into architectures that otherwise live on AWS or Azure.

Ways to access BigQuery

You have three primary access paths, and the Professional Cloud Architect exam expects you to know all three.

The first is the Cloud Console. This is the interactive web UI where you write queries, browse datasets, and view results. It is where most people start, and it is the right answer when a question asks how a non-developer analyst should explore data.

The second is the bq command line tool. This ships with the Google Cloud SDK and is the natural fit for scripting, automation, and CI workflows. When a PCA scenario describes operators automating dataset creation or scheduled query execution from a pipeline, bq is the tool in play.

The third is the client libraries. BigQuery has official libraries for Go, Python, Java, Node.js, PHP, Ruby, and C#. These are how applications integrate with BigQuery directly, embedding queries inside services rather than running them out of band.

Query dialects: Standard SQL versus Legacy SQL

BigQuery supports two SQL dialects, and only one of them is the right answer.

Standard SQL is the preferred dialect. It conforms to the SQL 2011 standard, so it looks and behaves like the SQL most engineers already know. It also supports BigQuery's nested and repeated field model, which is one of the features that makes BigQuery distinctive for analytics on semi-structured data.

Legacy SQL is the older dialect, originally just called BigQuery SQL. It is non-standard, supported only for backward compatibility, and not recommended for new work. New features and optimizations land in Standard SQL.

The syntactic difference matters because PCA questions sometimes show query snippets. In Standard SQL, joins are explicit:

SELECT a.column1, b.column2
FROM `project.dataset.tableA` AS a
JOIN `project.dataset.tableB` AS b
ON a.id = b.id;

The JOIN keyword and ON clause make the relationship between the tables explicit, which the query planner can optimize and which is harder to misread.

In Legacy SQL, the same query is written with implicit joins:

SELECT a.column1, b.column2
FROM [project:dataset.tableA] a, [project:dataset.tableB] b
WHERE a.id = b.id;

The tables are listed with a comma and the join condition lives in the WHERE clause. There is no JOIN keyword. This is more error-prone because forgetting the WHERE condition produces a silent cross join, which can blow up cost and runtime. The bracketed table reference syntax with a colon between project and dataset is also a Legacy SQL signal worth recognizing on sight.

If a PCA exam question asks which dialect to use, the answer is Standard SQL.

BigQuery resource hierarchy

BigQuery's resource hierarchy is short, and it maps cleanly onto how you think about data ownership and IAM.

At the top is the project. A project contains exactly one BigQuery instance, so when you talk about "BigQuery in project X," you are talking about a single namespace tied to that project's billing, IAM, and quotas.

Inside the project, you have datasets. A dataset is a logical container for related tables. Datasets are where you set the geographic location of your data (US, EU, or a specific region), and they are the unit at which you typically grant access. If a team needs to read sales data, you grant them a role on the sales dataset rather than table by table.

Inside each dataset, you have tables. Tables hold the actual rows and columns. They can be standard tables, partitioned tables, clustered tables, external tables that point at Cloud Storage or other sources, or views.

So the flow is project to dataset to table. That hierarchy shows up directly in the fully qualified name you use in Standard SQL: `project.dataset.table`. When a PCA scenario describes data residency requirements, dataset location is the lever. When it describes access control, dataset-level IAM is usually the right granularity. When it describes an architecture that spans environments, you typically have a project per environment, each with its own datasets.

What to take into the exam

For the Professional Cloud Architect exam, the BigQuery basics you need locked in are: it is a fully managed serverless data warehouse, you can reach it from the Cloud Console, the bq CLI, or client libraries, you should always use Standard SQL for new work, and the resource hierarchy is project to dataset to table, with dataset being the level where location and most access decisions live.

My Professional Cloud Architect course covers BigQuery overview and resource hierarchy alongside the rest of the storage and analytics material.