What to know about Cloud Run for the Professional Cloud Architect Exam

March 28, 2026

Let's discuss some of what you need to know about Cloud Run for the Professional Cloud Architect exam.

Cloud Run is a fully managed serverless platform designed to run stateless containers that respond to HTTP requests. It works well for microservices, web applications, and APIs where you want to deploy code without thinking about the underlying infrastructure. No provisioning, no patching, no capacity planning. You write your code, package it in a container, deploy it, and Cloud Run handles the rest.

This makes it a good fit for teams that want to move fast and keep operational overhead low. Cloud Run has a good chance of showing up on the Professional Cloud Architect (PCA) exam, and the questions tend to focus on when to choose it, how it scales, and how to architect it for high availability.

Common Use Cases

Cloud Run supports a range of workloads, but a few patterns come up more often than others.

Intermittent traffic. Many applications experience traffic spikes followed by periods of inactivity. Cloud Run handles these fluctuations automatically by scaling up and down based on demand, which makes it a good option for applications with unpredictable or bursty traffic patterns.

Stateless containers. Cloud Run is built for workloads where each request is processed independently. Containers can be added or removed freely because there is no session state to preserve. We will cover this in more detail below.

Microservices. Cloud Run lets you deploy and scale individual components of an application independently. If your application is broken into decoupled services, each one can run as its own Cloud Run service with its own scaling behavior.

APIs. Cloud Run provides solid support for hosting HTTP-based APIs, whether they serve requests, handle data transformations, or sit behind other services. Its autoscaling and request-handling model maps well to API workloads.

The common thread across all of these is rapid scaling with minimal infrastructure management. That is the core value proposition of Cloud Run.

Autoscaling and Cold Starts

One of the key features of Cloud Run is automatic horizontal scaling. It adds or removes container instances in response to traffic rather than adjusting the compute resources of existing instances. Instances are created when traffic arrives and removed when it drops.

An important detail here is that Cloud Run can scale to zero. If there is no traffic, Cloud Run stops running instances entirely, and you stop paying. You only pay for what you use.

To illustrate how this works in practice: during steady traffic, you might see two instances running. As traffic increases, Cloud Run automatically scales up, maybe to six instances. When traffic drops to zero, the instance count goes to zero as well, and there are no charges for idle resources.

Cold Starts

Scaling to zero is cost efficient, but it introduces a tradeoff called a cold start. When no instances are running and a new request comes in, Cloud Run needs to start up an instance from scratch. That startup time causes a delay, which can be problematic for latency-sensitive applications.

There are two common strategies to address this.

Set a minimum number of instances. You can configure Cloud Run to keep one to three instances running at all times. This eliminates cold starts because there is always an instance ready to handle traffic. The tradeoff is that you pay for those instances whether they are serving requests or not.

Pre-warming. This involves periodically sending requests to your service to keep instances active. By keeping instances warmed up, you avoid the startup delay without permanently reserving capacity.

Both strategies help keep your service responsive. Cold starts and the strategies to mitigate them could show up on the PCA exam, so it is worth understanding when each approach makes sense.

Stateless Containers

Stateless means the container does not retain any data or session information between requests. Each request is processed independently, and nothing from one request carries over to the next.

This is the kind of workload Cloud Run is designed for. Because containers hold no state, Cloud Run can start and stop instances freely without affecting the application's functionality. There is no session data to lose and no local storage to preserve.

If you come across a PCA exam question that specifically mentions stateless containers, Cloud Run should be one of the first services you consider. It is built for exactly this pattern. Stateful workloads, where the application needs to remember things between requests, are better suited for other compute options like Compute Engine or GKE.

Network Endpoint Groups

Sometimes you want to put a load balancer in front of Cloud Run to manage incoming traffic. The problem is that load balancers are traditionally used with infrastructure you manage directly, like Compute Engine VMs. They don't natively connect to serverless services like Cloud Run.

Network endpoint groups, or NEGs, bridge that gap. A NEG acts as an intermediary that registers your Cloud Run service as an endpoint the load balancer can route traffic to. The load balancer directs traffic to the NEG, and the NEG forwards it to Cloud Run.

A few things to remember here. Load balancers don't natively support serverless services, so NEGs are required to make that connection. NEGs are most commonly used with HTTPS load balancers specifically, which provides secure and reliable traffic routing to Cloud Run services.

If you see a PCA exam question about using a load balancer with Cloud Run, the answer will almost certainly involve a NEG.

High Availability Architecture

A question you could see on the PCA exam is how to build a globally available application using Cloud Run. The pattern looks different from what you would do with Compute Engine or GKE, where the load balancer connects directly to instances or nodes.

Since Cloud Run is serverless, the load balancer has no direct infrastructure to connect to. Instead, you deploy separate Cloud Run services across multiple regions and use NEGs to connect each one to the load balancer.

Here is how the architecture works. Global users send requests to a cloud load balancer, which acts as the single entry point for all traffic. The load balancer routes each request based on user proximity and service availability, sending it to the nearest healthy region. Behind the load balancer, you have Cloud Run services deployed in multiple regions, for example US Central 1, Europe West 1, Asia East 1, and Australia Southeast 1. Each regional service operates independently and handles requests on its own. NEGs connect each regional Cloud Run service to the load balancer.

This architecture combines Cloud Run's serverless scaling with the traffic management capabilities of load balancers and the flexibility of NEGs. The result is a system with high availability, low latency for users around the world, and efficient traffic distribution across regions.

This kind of global architecture question has a good chance of appearing on the PCA exam, particularly in scenarios that describe applications with a global user base and strict availability requirements.

Conclusion/Summary

Cloud Run runs stateless containers, scales automatically, and removes the need to manage servers. The exam-relevant concepts to focus on are autoscaling behavior (including cold starts and how to mitigate them), stateless containers as the target workload, NEGs as the bridge between load balancers and serverless services, and multi-region deployment patterns for high availability.

Understanding these areas should put you in good shape for most Cloud Run questions on the Professional Cloud Architect exam.