Cloud Run for the PCA Exam: Use Cases & Cold Starts

Ben Makansi

December 27, 2025

Cloud Run shows up on the Professional Cloud Architect exam as the default answer for a specific shape of workload, and the trick is recognizing that shape quickly. I want to walk through what Cloud Run is, the use cases it fits, how its autoscaling behavior creates the cold start problem, and what stateless actually means in this context.

What Cloud Run Is

Cloud Run is a fully managed, no-ops, serverless platform for running stateless containers that are invoked over HTTP. You hand Google a container image, and Google handles provisioning, scaling, and maintaining the infrastructure that runs it. There is no server for you to patch and no cluster for you to size.

That sentence is doing a lot of work, so let me break out the parts that matter for the Professional Cloud Architect exam:

Fully managed and no-ops: you do not manage VMs, nodes, or operating systems.
Serverless: you pay for what gets used, not for capacity reserved in advance.
Stateless containers: the runtime does not preserve in-memory state between requests.
HTTP-invoked: the container needs to listen on a port and respond to HTTP requests.

If a question describes a team that wants to ship a containerized service without managing infrastructure, and the workload is request-driven, Cloud Run is usually the right pick.

Common Cloud Run Use Cases

The exam-relevant use cases cluster around four patterns:

Intermittent traffic: workloads with spiky or unpredictable traffic. Cloud Run scales up when traffic arrives and scales down when it stops, so you do not pay for idle capacity.
Stateless containers: any workload where each request can be handled independently, without retaining session data on the instance.
Microservice architectures: small, independently deployed and scaled components. Cloud Run is well-suited to running each microservice as its own service.
APIs: HTTP APIs and data transformation endpoints. The HTTP-driven model maps directly onto API workloads.

The unifying theme is rapid scaling with minimal infrastructure management. When a question hands you a workload with bursty traffic and a team that does not want to run a cluster, Cloud Run is the answer most of the time.

Automatic Scaling and Scale to Zero

Cloud Run scales horizontally. It adds and removes instances in response to incoming traffic rather than resizing the compute on a single instance. Two instances might be running during normal traffic, six instances during a spike, and zero instances when no requests are coming in.

The scale-to-zero behavior is one of the things that makes Cloud Run cheap. If your service receives no traffic, Cloud Run runs no instances, and you pay nothing for compute during that period. For workloads that are quiet most of the day, this is a meaningful cost difference compared to running a VM or a GKE node continuously.

Cold Starts and How to Mitigate Them

Scale to zero has a tradeoff: when a request arrives at a service that has scaled down to zero, Cloud Run has to start a new instance from scratch before it can serve the request. That startup time is a cold start, and it shows up to the user as added latency on that first request.

Two mitigations come up on the Professional Cloud Architect exam:

Minimum instances: configure Cloud Run to keep a small number of instances always running, typically one to three. This guarantees that incoming requests hit a warm instance instead of waiting for one to start.
Pre-warming: periodically send requests to your service to keep instances active. This works on the same principle as minimum instances but uses traffic to keep things warm rather than a configuration setting.

The exam framing to watch for: a question describes a Cloud Run service with latency-sensitive first requests after idle periods. The fix is minimum instances. If a question describes a team paying too much for idle capacity on a request-driven service, the answer often goes the other direction, and you let it scale to zero.

Stateless Containers

Stateless means the container does not retain any data or session information between requests. Each request is handled independently. The container does not care which instance previously served a user, and the platform does not preserve in-memory state when an instance shuts down.

This is the property that makes Cloud Run's scaling model work. Because instances do not hold onto state, Cloud Run can start them, stop them, and route traffic between them freely without breaking the application. If your workload needs to remember things between requests, that state has to live somewhere else, like Memorystore, Firestore, or Cloud SQL.

The opposite is stateful. A stateful workload retains data or session information across requests, so instances cannot be swapped or terminated without coordinating that state. Stateful workloads are not a good fit for Cloud Run, and the exam will usually steer you toward GKE or Compute Engine for those.

The pattern-matching shortcut: if a question explicitly mentions stateless containers, strongly consider Cloud Run. That phrasing is essentially the platform's tagline, and the exam uses it as a signal.

Putting It Together for the Exam

For Professional Cloud Architect questions about Cloud Run, the decision flow looks like this:

Is the workload a stateless container served over HTTP? Cloud Run is in the running.
Is the team trying to avoid infrastructure management? Cloud Run, not GKE.
Is traffic intermittent or unpredictable? Cloud Run's scale to zero is a feature, not a bug.
Is first-request latency a concern? Configure minimum instances or pre-warm.
Does the workload need persistent in-memory state? Cloud Run is the wrong answer; reach for GKE or Compute Engine.

That covers the Cloud Run material the Professional Cloud Architect exam will test you on at this level: what it is, when to pick it, why it scales the way it does, and how to handle the cold start tradeoff that comes with scale to zero.

My Professional Cloud Architect course covers Cloud Run alongside the rest of the containers and serverless material.