MIG Autoscaling: Health Checks, Initial Delay, and Preventing Over-Provisioning

Ben Makansi
December 15, 2025

Autoscaling with Managed Instance Groups sounds straightforward: add VMs when load is high, remove them when load drops. In practice, there is a specific failure mode that trips up people implementing autoscaling for the first time, and it shows up on the Associate Cloud Engineer exam: over-provisioning caused by health checks firing before new VMs have finished booting. Understanding how this happens and what the initial delay setting does to prevent it is the core of what the exam tests on this topic.

How Autoscaling Decides to Scale

The autoscaler in a MIG continuously monitors a scaling signal - usually CPU utilization, but potentially a custom metric. When the signal exceeds the target value (for example, CPU stays above 70 percent across the group), the autoscaler calculates how many additional VMs are needed to bring utilization down to target and creates them from the instance template.

The new VMs take time to boot - running their startup script, initializing the application, and getting ready to serve traffic. During this boot period, the VMs exist but are not yet contributing to the group's capacity. If the autoscaler is aggressive and sees that utilization is still high while the new VMs are booting, it might try to add even more VMs before the first batch has finished starting. This can lead to a situation where far more VMs are created than are actually needed, because the signal has not yet stabilized.

The Over-Provisioning Problem

Over-provisioning happens when the autoscaler creates too many instances because it does not account for instances that are still booting. Here is the sequence: traffic spikes, CPU goes high, autoscaler adds five VMs. Those five VMs take 90 seconds to boot. During those 90 seconds, CPU is still high because the new VMs are not serving traffic yet. The autoscaler sees high CPU, adds five more VMs. Now there are ten new VMs booting. When all ten finish booting, the load is distributed across the full group and CPU drops dramatically - but now you have ten extra VMs that are not needed. You are paying for them, and the scale-down process will eventually remove them, but the temporary over-provisioning wasted money and created unnecessary churn.

What Health Checks Do in Autoscaling

A health check is a probe - an HTTP request, a TCP connection attempt, or an HTTPS request - that the MIG sends to each instance to determine whether it is healthy and ready to receive traffic. An instance that returns a successful response to the health check is considered healthy. An instance that fails the health check is either still booting, experiencing a problem, or crashed.

In the context of autoscaling, health checks determine when a new VM should be counted as part of the group's capacity. An instance that has not yet passed its health check is not considered available to serve traffic, and the load balancer does not route requests to it. This means that even though a new VM exists in the MIG, it is not yet reducing the load on the existing instances.

The Initial Delay Setting

The initial delay is a configuration parameter on the MIG's health check that tells the autoscaler how long to wait after a VM starts before checking its health. During the initial delay period, the VM is assumed to be booting and health check failures are ignored.

Setting the initial delay correctly prevents over-provisioning. If your application typically takes 90 seconds to fully initialize after the VM boots, you set the initial delay to at least 90 seconds. The autoscaler then waits 90 seconds before checking whether the new VMs are healthy, giving them time to complete initialization before the health status influences scaling decisions.

If you set the initial delay too short, the health check runs while the application is still initializing. The VM fails the health check. The autoscaler interprets this as the instance being unhealthy and potentially creates more instances to compensate. You end up with unnecessary instances and a cycle of over-provisioning.

If you set the initial delay too long, there is a delay before the autoscaler knows that new instances are healthy. This does not cause over-provisioning but does mean that scale-out takes longer to benefit the overall group. The right value is close to the actual initialization time of your application - close enough to avoid delays, but not so short that it fires during startup.

Custom Metrics for Autoscaling

CPU utilization is the default autoscaling signal, but it is not always the most meaningful one. Cloud Monitoring custom metrics let you scale on signals that better represent your application's actual load.

A common example is Pub/Sub queue depth. If you have a fleet of worker VMs processing messages from a Pub/Sub subscription, scaling based on the number of unacknowledged messages is more accurate than scaling on CPU. A backlog of messages means more workers are needed even if the current workers are not yet CPU-bound. When the queue clears, you scale down even if CPU is still moderate. This keeps the fleet appropriately sized for the actual work rather than for an indirect indicator.

Request latency is another useful custom metric. If your latency target is 200ms and the 99th percentile latency is climbing toward 400ms, that is a better signal to add capacity than waiting until CPU hits 80 percent.

How the Exam Tests This

The Associate Cloud Engineer exam presents this topic through a specific scenario: a MIG is autoscaling, but new instances are being flagged as unhealthy before they finish starting up, causing the autoscaler to add more instances than necessary. The fix is to increase the initial delay so that health checks do not run until after the application has had time to initialize.

This is a scenario where understanding the cause-and-effect relationship between the initial delay setting and over-provisioning is what distinguishes the correct answer from plausible-sounding wrong answers like increasing the maximum instance count or changing the health check protocol.

My Associate Cloud Engineer course covers MIG health checks and autoscaling configuration in detail, including how the initial delay setting interacts with scale-out behavior on the Associate Cloud Engineer exam.

arrow