
AlloyDB organizes its resources into a hierarchy of clusters, instances, and nodes, and understanding how those pieces fit together is what most of the provisioning questions on the Professional Cloud Database Engineer exam come down to. A cluster is the top-level container. Instances are the connection points that applications talk to. Nodes are the virtual machines that actually run the database engine. The exam tends to test whether you can pick the right combination of these for a given requirement, so it is worth being precise about what each one does and where its boundaries are.
An AlloyDB cluster is a regional container that holds your databases, logs, and metadata. Everything inside a cluster stays within a single Google Cloud region, so when you create a cluster you are choosing the region those resources live in. The cluster stores all of its data in a distributed, multi-zone storage layer that is separate from the compute nodes. That storage layer is regional, and it scales automatically to meet capacity needs without manual intervention.
The separation between storage and compute is the detail that makes the rest of the model work. Because the data lives in the regional storage layer rather than on the nodes, multiple instances across different zones can all access the same underlying data. You are not copying data to each instance. You are pointing additional compute at storage that already holds everything.
An instance serves as the connection point for applications and provides a stable IP address for your workloads. There are two kinds. A primary instance supports read and write operations. A read pool instance provides read-only access and exists to handle heavy query volume. A cluster has one primary instance, and you add read pool instances when you need to scale out reads.
A primary instance can be configured as either basic or highly available. The basic configuration is meant for development. It contains a single active node that connects to the regional storage layer, and because there is no standby, it has no redundancy. The highly available configuration is meant for production workloads. It spans two zones, with an active node in one zone and a standby node in another, so that the cluster can survive the loss of a single zone.
In a highly available primary instance, an active node runs in one zone and handles all read and write traffic, while a standby node sits provisioned and ready in a second zone. A health monitoring system periodically checks the active node. If the node fails multiple consecutive checks, or if the zone it runs in has an outage, AlloyDB promotes the standby node in the second zone to the primary role. Traffic is redirected automatically to that new active node, and it begins accepting new connections.
After the promotion, a standby node is recreated in the zone that was previously down, which restores redundancy and prepares the cluster for any subsequent failure. Failover typically completes in under 30 seconds with no data loss, and it happens without manual intervention. For the exam, the points worth holding onto are that high availability is a two-zone, single-region arrangement, that the standby is a failover target rather than a source of extra read capacity, and that the recovery is automatic and fast.
Read pool instances are how you scale read-only traffic. Each read pool instance contains multiple nodes, and by adjusting the number of nodes you adjust the compute power dedicated to reads. Because the storage layer is regional, read pool nodes in any zone read from the same data the primary writes to, so adding read capacity does not involve replicating the data set to each pool.
There is a hard limit to be aware of. A cluster supports a maximum of 20 nodes across all of its read pools combined. That ceiling applies to the total, not to a single pool, so when you plan capacity you are budgeting those 20 nodes across however many read pool instances you create. Sizing read pools is about matching capacity to demand rather than over-provisioning for peak, which aligns capacity requirements with cost efficiency.
You can manage that node count in two ways. Manual scaling lets you explicitly define how many nodes are active at any given time. Autoscaling lets Google Cloud manage the node count for you and removes the overhead of watching traffic levels yourself. Autoscaling comes in two forms. CPU-utilization-based scaling adds or removes nodes to maintain a defined CPU utilization target, so the pool grows during spikes and shrinks when load decreases. Schedule-based scaling enforces a minimum node count during set time windows, which prevents the pool from scaling below that level when you already know you will need the capacity. The two are not mutually exclusive in intent. CPU-based scaling reacts to load, while schedule-based scaling guarantees a floor during known busy periods.
When you are working through provisioning scenarios for the Professional Cloud Database Engineer exam, it helps to keep the layers distinct. The cluster sets the region and owns the regional storage. The primary instance, basic or highly available, handles writes and zonal redundancy. Read pool instances, capped at 20 nodes in total and scaled manually or automatically, handle read volume.
Our Professional Cloud Database Engineer course covers AlloyDB provisioning and resource allocation alongside high availability failover and read pool autoscaling, with practice questions that drill these distinctions.