App Engine has three scaling modes, and the Associate Cloud Engineer exam expects you to know what each does and when to pick which. This article covers all three, the most important difference between App Engine and Cloud Run scaling, and the exam patterns that test these.
It does not cover instance class pricing, target latency tuning, or the distinction between App Engine Standard and Flex environments at full depth. Those are real, but the exam usually tests the high-level scaling-mode choice.
Automatic scaling. App Engine watches incoming traffic and starts or stops instances based on demand. You configure a minimum and a maximum number of instances. You can also set targets for things like latency and concurrency. This is the default and the right choice for most workloads.
Basic scaling. App Engine starts an instance when a request arrives, and shuts it down when there is no traffic for a while. There is no continuous load monitoring like with automatic scaling. You set a maximum number of instances and an idle timeout. This is appropriate for workloads that are infrequent and where it is fine to wait for an instance to spin up on demand.
Manual scaling. App Engine runs a fixed number of instances that you specify. Instances stay up regardless of traffic. They do not scale based on load. This is for workloads that need persistent state in memory, long-running background tasks, or any case where you specifically do not want App Engine to spin instances up and down.
Automatic is the default and the right answer for almost every general-purpose web app or API. If a question describes a customer-facing application with variable traffic and you do not see anything else specific, automatic scaling is the answer.
Basic is the niche middle option. It fits workloads that are bursty and tolerant of cold starts. Internal tools, occasionally-used admin endpoints, things that do not need a constantly-warm instance.
Manual is the right answer when the question mentions a need for in-memory state, long-running tasks, or websockets. Manual scaling instances are stable and persistent in a way the other two are not.
This is the comparison the Associate Cloud Engineer exam tests most often. App Engine automatic scaling does not scale to zero. It always keeps at least one instance running. Cloud Run does scale to zero. If traffic stops, all Cloud Run instances go away.
That difference shows up on the exam two ways. If a question describes a service that can sit idle and the team wants to pay nothing during idle periods, Cloud Run is the answer because of scale-to-zero. If a question describes a service that needs at least one instance always available for instant response, App Engine automatic scaling fits naturally because it never goes to zero.
The trade-off is the same trade-off as Cloud Run cold starts, just expressed in App Engine terms. Always-warm costs more. Scale-to-zero saves money but adds latency on the first request.
App Engine scaling is configured in app.yaml. Here is what automatic scaling looks like:
runtime: python39
automatic_scaling:
min_instances: 1
max_instances: 10
target_cpu_utilization: 0.6
Basic scaling looks like this:
runtime: python39
basic_scaling:
max_instances: 5
idle_timeout: 10m
Manual scaling is the simplest:
runtime: python39
manual_scaling:
instances: 3
Each App Engine service has exactly one of these. You cannot mix scaling modes within a single service.
If you see a question about a typical web app on App Engine that needs to handle variable traffic, the answer is automatic scaling. This is the most common case.
If you see a question about an App Engine service that needs to maintain in-memory state or run long-running background tasks, the answer is manual scaling. The clue word is usually persistent or stateful or some hint that instances cannot just disappear.
If you see a question contrasting App Engine and Cloud Run for a service that needs to pay nothing during idle periods, the answer is Cloud Run because App Engine automatic scaling does not scale to zero.
Automatic for variable traffic and most general workloads. Basic for occasional, bursty traffic. Manual for stateful or long-running workloads. The biggest gotcha vs Cloud Run is that App Engine automatic scaling never goes to zero, which makes it the wrong choice if cost during idle periods is the priority.
My Associate Cloud Engineer course covers App Engine scaling modes alongside the Cloud Run comparison the Associate Cloud Engineer exam tests.