Bigtable Schema Design Use Case for the PCA Exam: Fleet of Cars

GCP Study Hub
Ben Makansi
December 30, 2025

The Professional Cloud Architect exam loves a Bigtable schema design question, and the fleet-of-cars scenario is one of the cleanest examples of how row key choice makes or breaks a deployment. I want to walk through it the way it shows up on the test so you can recognize the pattern instantly when you see it.

The scenario

Imagine you have a fleet of delivery vehicles, thousands of cars, each equipped with multiple IoT sensors that continuously report metrics. There are different sensors for speed, fuel level, temperature, and location. These sensors are sending data every few seconds, and in some cases multiple times per second. That creates a massive volume of time-series data that has to be stored efficiently in Bigtable.

The question the exam is really asking: how do you optimize the schema design for both efficient writes and efficient reads at this scale?

The schema that works

Here is the design that performs well for both reads and writes:

  • Row key: vehicle_id#timestamp
  • Column family: sensor_data
  • Column qualifier: sensor_type (speed, fuel_level, temperature, location)

Each row key looks like vehicle_00001#2025-01-15-10:00:01. The column family sensor_data groups all the related sensor metrics together. The column qualifier identifies the specific sensor reading, so you have separate columns for speed, fuel_level, temperature, and location.

Three things make this design hold up:

  1. The row key distributes writes across tablets. Because you start with vehicle_id, and you have thousands of distinct vehicles, each vehicle's data lands on different tablets. Writes are naturally spread across many tablet servers instead of clustering on one.
  2. The column family groups related sensor metrics together. All the sensor data for a given vehicle at a given timestamp is co-located, which makes reads efficient.
  3. The column qualifier identifies the specific sensor reading, so you can query for a single metric like speed or fuel level without scanning everything else.

Why the other row keys fail

The Professional Cloud Architect exam will usually offer you three or four row key options and expect you to know which ones are wrong and why. There are two failure modes worth recognizing.

Failure 1: timestamp-first

Consider timestamp#vehicle_id. Look at what happens at any given moment:

2025-01-15-10:00:01#vehicle_00001  -> Tablet 1
2025-01-15-10:00:01#vehicle_00002  -> Tablet 1
2025-01-15-10:00:01#vehicle_50000  -> Tablet 1

All 50,000 vehicles writing at the current timestamp share the same prefix, so they all target the same tablet. That creates a severe hotspot. One tablet is overloaded while the others sit idle. This is the canonical Bigtable anti-pattern, and the exam wants you to identify it on sight.

Failure 2: low-cardinality prefix

Consider vehicle_type#timestamp:

sedan#2025-01-15-10:00:01  -> Tablet 1
sedan#2025-01-15-10:00:01  -> Tablet 1
truck#2025-01-15-10:00:01  -> Tablet 2

Vehicle type has very low cardinality. You only have a handful of values like sedan, truck, and van. That means writes can only spread across as many tablets as you have vehicle types, not across the thousands of unique vehicle identifiers you actually have. The distribution is capped at a tiny number, so most of your tablet capacity is wasted.

Why vehicle_id-first wins

Compare it to the correct row key:

vehicle_00001#2025-01-15-10:00:01  -> Tablet 1
vehicle_00002#2025-01-15-10:00:01  -> Tablet 2
vehicle_50000#2025-01-15-10:00:01  -> Tablet N

Even though all of these are at the same timestamp, the prefixes are all different because each vehicle ID is unique. Writes get distributed naturally across the cluster.

The principle to take into the exam

The rule for Bigtable schema design that the Professional Cloud Architect exam tests over and over: start your row key with a high-cardinality identifier that distributes writes across many tablets. In the fleet-of-cars scenario, that identifier is vehicle_id. Reverse timestamp can also work in some scenarios, although it sometimes makes reads less efficient because the most recent data ends up clustered together. The point is to avoid any prefix that all your writers will share at the same moment, and to avoid any prefix whose distinct values are too few to spread the load.

If you can recite this scenario and the two failure modes, you can answer almost any Bigtable row key question the exam throws at you.

My Professional Cloud Architect course covers Bigtable schema design alongside the rest of the advanced architecture material.

arrow