Resource Quotas on GCP for the PCA Exam

Ben Makansi

January 29, 2026

Resource quotas are one of those topics that sounds boring until your autoscaling stops scaling and you cannot figure out why. The Professional Cloud Architect exam tests this exact scenario, so it is worth understanding what quotas are, how they fail, and where to look first when something breaks.

What a quota actually is

A resource quota in Google Cloud is a hard limit on how much of a given resource your project can consume. The limit can be on a count of things, like the number of Compute Engine instances in a region, or on a rate, like the number of API requests per minute. Quotas exist so one runaway project cannot consume so much capacity that it starves other customers in the same region.

Every GCP project starts with a default set of quotas. Some are generous, some are surprisingly tight, and the tight ones are exactly the ones that bite you under load. CPUs per region, persistent disk capacity, external IP addresses, load balancer forwarding rules, and per-API request rates are all common culprits.

Why quotas matter for autoscaling

Autoscaling sounds like it should solve capacity problems on its own. You configure a managed instance group, set a target CPU utilization, and the platform adds VMs as load grows. The catch is that autoscaling can only add VMs the project is allowed to have. The moment your group tries to scale past the regional CPU quota or the regional instance quota, new VMs stop being created.

From the application side, this looks like a service that suddenly cannot keep up with traffic. Existing instances saturate, the load balancer runs out of healthy backends with capacity, and clients start getting 503 Service Unavailable responses. The platform is not broken. It is doing exactly what you told it to do, which is honor the quota.

The 503 troubleshooting path

When traffic spikes and a system that is designed to scale suddenly throws 503 errors, the Professional Cloud Architect exam expects a specific first move. Before you start inspecting the database, reviewing recently shipped features, or second-guessing the load balancer config, you check whether the project has hit a resource quota.

The reasoning is simple. A 503 typically means the service is temporarily unable to handle requests because of resource constraints. If autoscaling was supposed to absorb the spike and did not, the most likely cause is that the autoscaler tried to provision more capacity and was denied. Quotas are the first thing that produces that exact failure pattern.

Other checks are still useful, but they come later. A misconfigured load balancer would more likely cause uneven distribution across backends, not 503s tied to capacity. Database hot spots show up as latency on specific queries before they show up as service-wide unavailability. New feature regressions usually have a clear correlation with a deploy. Quota exhaustion correlates with traffic, which is exactly the symptom you have when a new game mode launches and player counts surge.

Where to actually look

In the Google Cloud Console, the Quotas page under IAM and Admin shows current usage and the configured limit for every quota in the project, filterable by service and region. This is the page you want bookmarked. When something is on fire, you go there, sort by usage percent, and find the row that is at or near 100 percent.

The same data is available through the Cloud Monitoring metrics for each service, which means you can build dashboards and alerts on quota usage. The right pattern is to alert when usage crosses a threshold like 80 percent, well before you hit the wall. Waiting for the quota to be fully consumed before you find out is the version of this problem that ends up in a postmortem.

Requesting more quota

If a quota is genuinely too low for your workload, you request an increase from the Quotas page. Some increases are granted automatically. Others, especially for resources that affect regional capacity planning, go through a review and can take a day or more. This matters for the exam because the right answer to a quota constraint is not always to provision around it. Sometimes it is to request the increase, and the question often hinges on whether the architect knows that path exists.

Operational habits that prevent the surprise

The Professional Cloud Architect exam frames a lot of operational questions around what a good architect does in advance versus what they do reactively. Quotas are a clean example. Reactive is checking the Quotas page after 503s start. Proactive is monitoring quota usage as a first-class metric, setting alerts before a feature launch, and requesting increases ahead of campaigns or game mode rollouts that will obviously push traffic up.

For the exam, when you see a scenario about a sudden traffic surge, autoscaling that does not respond, and 503 errors, the first action is almost always to check resource quotas. Other answers may be valid investigations, but they come after you have ruled out the most common cause of this exact failure mode.

My Professional Cloud Architect course covers resource quotas alongside the rest of the architecture and compliance material.