
Cloud Spanner secondary indexes are one of those Professional Data Engineer exam topics that look simple on the surface and then become a trade-off question once you push on them. The exam likes to ask whether you should add an index at all, what columns to put in it, and how to keep a heavy read path from bouncing back to the base table. If you can answer those three questions cleanly, you can answer most Spanner index questions on the exam.
I want to walk through the model I use when I see a Spanner indexing scenario, because it lines up with how the Professional Data Engineer exam frames the choice.
Spanner is a distributed relational database. Every table has a primary key, and rows are physically ordered and split into ranges based on that key. That works beautifully when your query filters on the primary key, because Spanner can go straight to the right split and read sequentially. It works poorly when your query filters on a non-primary-key column, because now Spanner has to scan rows to find what you want.
A secondary index is a separate, sorted structure keyed on the columns you specify. When a query filters on those columns, Spanner can use the index to jump directly to the matching rows instead of scanning the base table. That is the read-side win.
For the exam, the cue is usually a query pattern that filters or orders by a column that is not in the primary key. If you see WHERE non_pk_column = ? running slowly, the answer is almost always a secondary index on that column.
Here is where the Professional Data Engineer exam gets more specific. A plain secondary index only contains the indexed columns and a pointer back to the base table row. If your query selects columns that are not in the index, Spanner has to do a second lookup against the base table for every matching row. That extra hop is sometimes called a table-back read or a base-table join, and on a hot query it is exactly what you want to avoid.
The fix is the STORING clause. You tell Spanner to copy additional columns into the index itself so the query can be answered entirely from the index. The index becomes a covering index for that query.
CREATE INDEX SongsByGenre
ON Songs(Genre)
STORING (Title, DurationSeconds);If a query selects only Genre, Title, and DurationSeconds and filters by Genre, Spanner reads the index and never touches the Songs table. On the exam, when a scenario describes a read-heavy workload that pulls a known set of columns and the query is slow, the right answer is usually a covering index built with STORING, not a bigger machine or a different database.
Spanner has a second indexing concept that the Professional Data Engineer exam likes to test, and that is the interleaved index. Spanner already lets you interleave child tables inside a parent table so that rows for the same parent key are stored physically close together. The same idea applies to indexes.
An interleaved index is rooted under a parent row, which means the index entries for a given parent live in the same split as the parent. When your query already knows the parent key and is filtering child rows by some other column, an interleaved index keeps the read local to a single split instead of fanning out across the cluster.
CREATE INDEX AlbumsByTitle
ON Albums(SingerId, AlbumTitle),
INTERLEAVE IN Singers;If the exam describes a parent-child relationship like Singers and Albums and asks how to make per-singer lookups by album title fast without scattering reads across the cluster, interleaved index is the answer. A plain global index would also work for correctness, but it would not give you the locality.
The trade-off the Professional Data Engineer exam wants you to articulate is write amplification. Every secondary index is a separate structure that Spanner has to keep in sync. When you insert, update, or delete a row in the base table, Spanner also has to update every index that references the affected columns. More indexes means more work per write, more storage, and more replication overhead.
A few practical implications that show up on the exam:
When I see a Spanner index question on the Professional Data Engineer exam, I run the same three checks. First, what is the query filtering on, and is that column in the primary key. If not, a secondary index is on the table. Second, what columns does the query select, and are they all in the index. If not, STORING is on the table. Third, is there a parent-child relationship and does the query already know the parent. If yes, interleaved index is on the table. Then I ask whether the write cost is worth it for the workload described.
That sequence lines up with how Spanner index questions are written. If you can run it in under thirty seconds in the exam room, you will not get pulled into the wrong answer choices.
My Professional Data Engineer course covers Cloud Spanner schema design, secondary indexes, the STORING clause, and interleaved tables in the database section, along with the rest of the storage and analytics services you need for the exam.