
Bigtable organizes its resources in a hierarchy, and the Professional Cloud Database Engineer exam tends to test whether you know which property is set at which level. An instance is the top-level container for your data. Inside an instance you have one or more clusters, and inside each cluster you have nodes. The storage type is fixed at the instance level, clusters are tied to specific zones, and nodes supply the compute that determines throughput. Getting these layers straight makes most provisioning questions straightforward, because the answer usually comes down to knowing where a setting lives and what it controls.
The instance is the logical grouping that holds everything else. When you create one, you choose its storage type, either SSD or HDD, and that choice applies to the whole instance. A cluster represents the actual Bigtable service running in a single zone, and because one instance can hold multiple clusters, you can place copies of your service in different zones. Nodes belong to a cluster and provide the compute that handles incoming requests and runs background maintenance.
One detail that matters for capacity planning is that Bigtable separates storage from compute. Adding nodes increases processing power without moving or rebalancing the underlying data, so you scale throughput by changing node count rather than by reshuffling storage. The node count you set directly determines total throughput and how many concurrent requests a cluster can handle.
Bigtable offers two instance types. A development instance runs a single node, which keeps cost low, but it has no replication and no throughput guarantee. It is meant for testing and low-cost development work where redundancy and high performance are not requirements. A production instance is built for operational workloads. It has one or more clusters with three or more nodes per cluster, it supports replication, and it comes with a throughput guarantee. If a question describes a workload that needs reliability and predictable performance, the production type is the fit, while the development type is the answer when the priority is minimal cost for testing.
The storage type decision usually resolves the same way. SSD is almost always the right choice because it provides faster access, which benefits workloads that need low latency and high performance. HDD only becomes worth considering in a narrow case, when the dataset is larger than 10 TB, is accessed infrequently, and latency is not a concern. In that specific situation HDD can offer cost savings over SSD. Outside of those conditions, SSD is the default, and the exam treats large, rarely accessed, latency-tolerant data as the signal for HDD.
Each cluster is placed in a specific zone, and that location cannot be changed after the cluster is created. Bigtable clusters are zonal resources, so if a zone becomes unavailable, the cluster in that zone is unavailable until service is restored. To reduce latency and improve availability, you store data close to the users and services that access it most often, which for data written from the eastern United States would mean placing the cluster in an eastern United States zone.
For higher availability you can run multiple clusters in different zones within the same instance, with one cluster in one zone and another in a different zone. Adding a second cluster automatically enables replication, which synchronizes data across the clusters in the background so the datasets stay consistent without you managing the data movement yourself. With more than one cluster, requests can be routed or balanced across zones, letting you direct traffic to the cluster closest to your users or spread the load so no single cluster becomes a bottleneck. The point to remember for the Professional Cloud Database Engineer exam is that replication in Bigtable follows from having more than one cluster, and that a single zonal cluster carries no protection against a zone outage.
Scaling in Bigtable is horizontal rather than vertical, meaning you add nodes rather than enlarge a single machine. With manual scaling you add nodes to a cluster yourself based on need. Each node handles a portion of the work, so more nodes improve write throughput and let the cluster manage higher traffic and more concurrent writes during peak loads.
Autoscaling adjusts node count automatically with some configuration. You set the minimum and maximum number of nodes allowed in the cluster, and those serve as hard limits that Bigtable will not go below or above. You also set target CPU and storage utilization. Bigtable monitors the cluster's actual CPU and storage usage, and when utilization rises above or falls below the targets, it adds or removes nodes to hold performance steady, always staying within the min and max you defined. This gives you enough capacity during peak hours while reducing cost when traffic is low. For the exam, the values worth holding onto are that the min and max are hard limits and that the scaling decision is driven by CPU and storage utilization targets.
Our Professional Cloud Database Engineer course covers Bigtable provisioning and resource allocation alongside replication and storage type selection, with practice questions that drill these distinctions.