Bigtable's biggest design constraint is that only the row key is indexed. There are no secondary indexes. Whatever you want to query, you have to encode into the row key, or accept that it will require a full scan. That makes row key design the single most important decision when you set up a Bigtable table, and the most likely thing to go wrong is something called hotspotting. The Associate Cloud Engineer exam tests this specifically. This article covers what hotspotting is, why row keys cause it, and the patterns that avoid it.
It does not cover every aspect of Bigtable schema design, secondary indexes via materialized views, or the operational details of repartitioning a hot table. The goal is what the Associate Cloud Engineer exam expects you to know.
Bigtable distributes data across many nodes. The way it decides which node holds which data is by row key. Rows with similar keys end up on the same node, and ranges of keys are owned by specific nodes. This is what makes scans efficient: a range scan over consecutive keys only has to talk to one or a few nodes.
The problem is, that same property creates a failure mode. If your application's writes or reads are concentrated on a narrow range of row keys, all that traffic hits a small number of nodes. Those nodes get overloaded while the rest of the cluster sits idle. That is hotspotting. The cluster has plenty of total capacity, but the workload is funneled into a fraction of it.
Hotspotting is the most common Bigtable performance problem, and it is almost always caused by a bad row key.
The Associate Cloud Engineer exam calls out three row key patterns that cause hotspotting.
Sequential numbers are the worst offender. If your row keys are 1, 2, 3, 4, and so on, then every new write goes to the node that owns the highest key range. That node is hot. Every other node is cold. As traffic grows, the hot node falls behind, and the cluster's total capacity is irrelevant.
Domain names that are not reversed are bad for a similar reason. If your row keys are www.site1.com, www.site2.com, www.site3.com, and most of your traffic is for a particular high-traffic site, all of that traffic goes to one row range.
Keys that need to be updated frequently are also problematic. Updating a row key is not really an update. It is a delete of the old row plus a write of a new row. That doubles the work and creates a moving hot spot as the active key range shifts.
Three patterns avoid hotspotting and are the ones the exam expects you to know.
Reverse domain names. Instead of www.mywebsite.com, store it as com.mywebsite.www. The high-order bytes are now the TLD, which spreads writes across the key space rather than concentrating them on the most popular subdomain. Canonical examples are this exact example.
Timestamps, but not at the front. A pure timestamp at the front of the key creates the same problem as a sequential number, because the most recent timestamp always lives at the end of the key range. The fix is to put something else first, like a user ID or a hash, and the timestamp later. The timestamp still gets you the temporal ordering you want for scans, without concentrating writes.
String identifiers, especially hashed or random ones. A user ID, a UUID, a hash of some natural key, all of these spread writes evenly across the key space.
A pattern that comes up specifically for time-series data is reversing the timestamp by subtracting it from a large constant, so that newer rows come first in scan order. This is mostly an optimization for "give me the latest data" queries, not a hotspotting fix on its own. If you reverse the timestamp but still put it at the front of the key, you have just inverted which end of the key range is hot. To actually avoid hotspotting in time-series data, combine the reversed timestamp with a leading identifier.
Salting is the technique of prepending a small random or hash-based value to the row key to spread writes. If you have an unavoidable natural ordering, you can salt the key with a hash of the rest of the key, modulo some small number. That distributes writes across N partitions while still keeping rows for the same logical entity grouped.
Salting is a tradeoff. It improves write distribution at the cost of making scans across the salted dimension harder, since the data is now sharded across multiple ranges. For high-throughput writes where scan locality matters less, it is the right move.
One last point that is not directly about row keys but matters for schema design. Bigtable tables are sparse. Empty cells cost no storage. That means you can create as many column qualifiers as you need without worrying about wasted space. This frees you to push more information into the row, rather than spreading it across multiple rows that need their own carefully designed keys.
If you see a question describing a Bigtable workload with poor write performance, uneven node utilization, or "all the writes hitting one node," the answer is a row key redesign. The phrase "hotspotting" is the strongest signal, and the fix is one of the patterns above.
If you see a question asking which row key design is correct, the right answer is usually a reversed domain name, a timestamp-not-at-the-front, or a hashed identifier. The wrong answers are sequential numbers, unreversed domain names, or anything that produces a narrow active range.
If you see a question asking what makes Bigtable schema design unusual, the answer involves the row key being the only index. Everything else follows from that constraint.
Only the row key is indexed in Bigtable, which is why row key design matters so much. Avoid sequential numbers, unreversed domain names, and frequently updated keys. Use reversed domain names, timestamps that are not at the front, and hashed or random string identifiers. The Associate Cloud Engineer exam tests this in pretty stylized scenarios, and once you know the patterns, the answers are usually obvious.
My Associate Cloud Engineer course covers Bigtable schema design alongside the use cases and access patterns the Associate Cloud Engineer exam tests.