Relational vs Non-Relational Databases: PDE Exam Guide

June 11, 2025

If you sit down to take the Google Cloud Professional Data Engineer exam without a clean mental model for relational versus non-relational databases, you are going to flip a coin on a lot of scenario questions. The exam loves to drop a paragraph about a workload and ask you to pick a storage service. The fastest way to answer those is to first decide which family the workload belongs to, and then narrow down to a specific Google Cloud product.

I want to walk through how I think about that decision when I am studying, and how I teach it inside my Professional Data Engineer prep.

The two families

Databases on the exam split into two big buckets. Relational, sometimes called SQL, stores data in tables with rows and columns under a predefined schema. Non-relational, often called NoSQL, is a broader category that covers anything that does not fit neatly into that tabular shape.

On the Google Cloud side, the relational services you should know are Cloud SQL, Cloud Spanner, and BigQuery. The non-relational services are Cloud Bigtable, Firestore, and Memorystore. Outside Google Cloud, the exam expects you to recognize names like MySQL, PostgreSQL, Oracle, and SQL Server on the relational side, and MongoDB, Cassandra, Redis, and DynamoDB on the non-relational side.

What makes a database relational

Relational databases are highly structured and standardized. The schema is defined up front. Every column has a type. Every row has to conform. That rigidity is a feature, not a bug. It is what lets relational engines guarantee data integrity and run rich joins across tables without surprises.

Three things matter on the exam:

SQL is standardized. Almost every analyst, BI tool, and data science library speaks it. That makes relational the default when the question mentions reporting, dashboards, or ad-hoc analytics.
ACID compliance. Transactions are atomic, consistent, isolated, and durable. If a question describes financial transactions, order processing, or anything where partial writes are unacceptable, you are almost certainly in relational territory.
Schema rigidity. This is the trade-off. Changing the structure of a relational table is painful, and shoving semi-structured or unstructured data into rows and columns is awkward.

The other classic relational weakness is scale. Traditional relational engines run on a single machine and start to strain as data volume grows. Cloud Spanner is the exception worth memorizing because it gives you relational semantics with horizontal scale, which is why it shows up so often in global, strongly consistent scenarios on the exam.

What makes a database non-relational

Non-relational databases are built to store and manage large volumes of data that do not fit a tabular structure. Inside that category, there are four sub-models you should be able to recognize on sight.

Document store. Data is stored as documents that look like JSON, with nested fields. Firestore is the Google Cloud example. MongoDB is the canonical third-party example. Use it when each record might have a different shape.
Key-value. Every record is a key mapped to an opaque value. Redis and Memorystore are key-value. Great for caching, session storage, and lookups where you only ever access data by a single identifier.
Column-family. Wide-column stores like Cloud Bigtable and Cassandra. Optimized for huge volumes of time series, IoT, or operational data with low-latency reads and writes.
Graph. Nodes and edges. Built for queries about relationships between entities, like a social network or a fraud ring. Google Cloud does not have a first-party graph database, but the model is still fair game conceptually.

The advantages line up the same way every time. Non-relational scales horizontally, handles flexible or evolving data, and tends to deliver higher read and write throughput at scale. The trade-offs are weaker ACID guarantees, no universal query language, and more friction for analytics workloads.

How to pick on a scenario question

When I read a Professional Data Engineer scenario, I run through a short checklist before I even look at the answer choices.

Is the data structured and does it have a stable schema? If yes, lean relational.
Does the workload need transactions or strong consistency across multiple rows? Relational, and probably Spanner if the question mentions global scale.
Is the workload analytics, reporting, or warehousing? BigQuery.
Is the data semi-structured, deeply nested, or likely to change shape? Document store, so Firestore.
Is the access pattern a single lookup by ID or a cache in front of something slower? Key-value, so Memorystore.
Is it massive time-series, IoT telemetry, or operational data with simple keys and low-latency requirements? Wide-column, so Bigtable.

That little flow handles a surprising share of database questions on the exam. The rest usually come down to a tighter detail, like a region requirement or a latency target, that pushes you between two services in the same family.

The one trap to watch for

The trap is assuming non-relational is always the right call because it scales better. Plenty of exam questions describe a workload that could run on Bigtable or Firestore, but the right answer is still Spanner or BigQuery because the scenario also mentions joins, transactions, or SQL analytics. Read the requirements twice before you commit.

My Professional Data Engineer course covers each of these database categories in depth, including the specific Google Cloud products in each family and the scenario patterns that map to them on the exam.

Relational vs Non-Relational Databases for the PDE Exam

The two families

What makes a database relational

What makes a database non-relational

How to pick on a scenario question

The one trap to watch for

Get tips and updates from GCP Study Hub