Firestore Data Model for the PCA Exam: Collections, Documents, Subcollections

November 24, 2025

Firestore organizes data as a tree of collections and documents, with optional subcollections nested under documents. This shape is fundamentally different from a relational schema, and the Professional Cloud Architect exam expects you to understand how it lets you model real-world hierarchies without joins. I want to walk through the model carefully so you can reason about it on the exam and in design questions.

The three building blocks

Firestore has exactly three structural primitives. A collection contains one or more documents. A document optionally contains one or more subcollections. A subcollection contains one or more documents, and those documents can in turn contain their own subcollections. The pattern repeats as deep as you need.

The important constraint is the alternation. You cannot put a document directly inside another document, and you cannot put a collection directly inside another collection. The hierarchy always alternates collection, document, subcollection, document, subcollection, document. This is the rule that determines every path in Firestore.

Documents hold the fields

Fields live on documents, not on collections. A document is a set of key-value pairs, where the values can be primitives like strings and numbers, or nested maps and arrays. Collections themselves carry no schema and no fields. They are simply containers that group documents under a shared path.

When you query Firestore, you query a collection (or a collection group) and Firestore returns matching documents. The collection is the addressable scope, and the document is the unit of data.

Subcollections express ownership

Subcollections are how Firestore expresses one-to-many relationships without foreign keys. If a user has many orders, you put an orders subcollection under each user document. The orders belong to that specific user by virtue of their path. There is no join, no parent ID field, no referential integrity check. The relationship is the path itself.

This matters for the Professional Cloud Architect exam because the design implication is significant. You are choosing a model where related data is colocated by hierarchy, which makes fetching a user and their orders cheap and deterministic. The trade-off is that querying across users (for example, all orders placed yesterday across every user) requires a collection group query rather than a single collection scan.

A worked example

Consider an ecommerce store. At the top level there is a users collection. Each user is a document inside that collection. user1 might hold fields like name, email, and address. user2 holds the same kinds of fields with different values.

Under each user document, there is an orders subcollection. user1 has its own orders subcollection containing order1 and order2. user2 has a separate orders subcollection containing its own order documents. Each order document carries fields like orderDate, totalAmount, and status.

Then under each order document there is another subcollection called orderItems. Each item in the order is its own document, with fields like product ID, quantity, and price. So a single product line ends up at a path like users/user1/orders/order1/orderItems/item1.

That path is the entire address. It encodes the relationship, the ownership, and the location of the data in one string. To fetch every item in a specific order, you read that subcollection. To fetch every order for a user, you read the level above. To fetch a user's profile, you read the document at users/user1.

Why this matters for the exam

When you see a Professional Cloud Architect question that involves Firestore, the design hint is usually about how data should be grouped. If the workload is "fetch a user and everything they own," subcollections are the natural answer because the hierarchy lines up with the access pattern. If the workload is "scan across all entities of a type regardless of owner," you need to think about collection group queries or a flatter top-level collection.

The other thing to remember is that there is no schema enforcement at the collection level. Two documents in the same collection can have entirely different fields. This is liberating during development and dangerous at scale, so the exam will sometimes test whether you know that validation lives in security rules and application code, not in the data model itself.

My Professional Cloud Architect course covers the Firestore data model alongside the rest of the databases material.