Firestore Data Modeling and Exploding Indexes for the PDE Exam

May 11, 2026

Firestore shows up on the Professional Data Engineer exam in a way that catches people off guard. It is not just "the serverless document database." The exam wants you to reason about how documents are organized, how queries are served, and why an index strategy can either keep a workload fast and cheap or quietly drive write costs through the roof. I want to walk through the two pieces that matter most for Professional Data Engineer prep: the Firestore data model, and the exploding composite index problem.

The Firestore data model: collections, documents, subcollections

Firestore is a document database, but the hierarchy matters. The building blocks are simple:

Collection: a container that holds one or more documents. Collections themselves do not store fields. They only group documents.
Document: a record identified by an ID, holding fields and values. A document can optionally contain one or more subcollections.
Subcollection: a collection that lives underneath a parent document. It holds its own documents, which can have their own subcollections, and the nesting can continue.

The model is meant to mirror real-world relationships. Take an ecommerce app. At the top sits a users collection. Each user is a document with fields like name, email, and address. Underneath each user document sits an orders subcollection, where each order is its own document with fields like orderDate, totalAmount, and status. Each order document can then have an OrderItems subcollection, where each item document carries product ID, quantity, and price.

That gives you a clean path from a user, to their orders, to the items inside each order. The hierarchy is not just visual organization. Queries scoped to a subcollection only see documents inside that subcollection, which is how you keep a user's orders separate from every other user's orders without joining anything.

For the Professional Data Engineer exam, the things to internalize are:

A collection contains documents, not fields.
A document contains fields and can contain subcollections.
Subcollections enable hierarchical, parent-scoped queries.
Firestore has no joins. You model your hierarchy so the data you fetch together lives together.

How Firestore indexes queries

Every Firestore query is served by an index. There is no "scan the collection" option the way you might think about it in a relational database. If an index does not exist to satisfy the query, the query fails.

Firestore creates two kinds of indexes:

Single-field indexes: created automatically for each field in a document, in both ascending and descending order, plus array-contains for array fields. These let you filter or sort on one field at a time.
Composite indexes: required when a query filters or sorts on more than one field together. You define these manually in the index configuration file, or Firestore prompts you to create them when a query fails.

The composite index is where the trouble starts.

Exploding composite indexes

Here is the rule that makes this dangerous: to support queries on documents, Firestore creates an index entry for every possible combination of values across the indexed fields. When you have multiple fields, especially array fields or fields with high cardinality, the number of index entries multiplies.

The classic example is a job listing document with these fields:

Industry: 6 values (Tech, Finance, Healthcare, Education, Government, Consulting)
Job Type: 4 values (Full-time, Part-time, Contract, Internship)
Experience Level: 4 values (Entry-level, Mid-level, Senior-level, Executive)
Location: 300 cities
Salary Range: 5 buckets

If you build a composite index across all five fields, Firestore needs an index entry for every combination:

6 industries x 4 job types x 4 experience levels x 300 cities x 5 salary ranges = 144,000 combinations per document

That is 144,000 index entries for a single document. Multiply that across a few million job listings and you have an index that costs a fortune to write, slows every insert, and bloats storage.

The problem gets worse when fields are arrays. If a document has three array fields and you index a composite across them, Firestore generates an entry for the Cartesian product of the arrays. Three arrays of ten elements each is a thousand entries per document, per write.

How to prevent the explosion

Firestore gives you two main levers, and the Professional Data Engineer exam expects you to know both.

The first lever is manual index configuration. Rather than letting Firestore auto-suggest composite indexes for every query you happen to run, you maintain an index configuration file that defines only the composites your application actually needs. If your job board never filters by all five fields at once, you do not build the five-field composite. You build the two or three composites that match real query patterns.

The second lever is single-field index exemptions. By default Firestore indexes every field in every document for single-field queries. For array fields or large nested map fields that you never query on, you can exempt them from indexing entirely. This stops the array-contains and ordered indexes from being created on those fields, which alone can save a large chunk of write cost. You can also exempt subfields inside a map, or disable ascending or descending order independently.

What to remember for the exam

If you see a Firestore question on the Professional Data Engineer exam, anchor on three checks:

Does the scenario describe a parent-child relationship that fits subcollections, or is it flat data that fits a single collection?
Are the queries single-field, or do they need composite indexes?
Are any of the indexed fields arrays or high-cardinality? If so, the exploding index trap is in play, and the right answer involves manual index configuration or single-field exemptions.

My Professional Data Engineer course covers Firestore data modeling, composite index design, the exploding index pattern, and the rest of the storage and database section of the exam, with every concept tied back to the question patterns Google actually uses.