Session Affinity in Cloud Load Balancing for the PCA Exam

Ben Makansi

March 18, 2026

Session affinity is one of those Cloud Load Balancing features that sounds simple on the surface and turns into a useful Professional Cloud Architect exam topic once you start asking when to actually turn it on. The exam likes scenarios where a workload misbehaves under default round-robin distribution, and session affinity is the lever that fixes it.

What session affinity actually does

By default, a Cloud Load Balancer distributes incoming requests across healthy backends. Each request can land on a different backend, which is exactly what you want for stateless services. But some workloads keep state on the backend itself, and that state is tied to a specific instance. If the next request from the same client lands on a different backend, that state is gone.

Session affinity tells the load balancer to send all requests from a given client during a session back to the same backend. The load balancer identifies the session, often through a cookie or a header, and uses that identifier to route consistently. The diagram I work through has a single user sending multiple requests with a session identifier like 123abc, and the load balancer keeps every one of those requests pinned to the same backend server for the life of the session.

HTTP(S) Load Balancers are the strongest option here because they operate at Layer 7. They can read application-layer details like cookies and headers, which gives them enough context to track a session properly. Lower-layer load balancers can offer affinity based on client IP, but the application-aware option is more precise and survives things like NAT and shared egress IPs.

When you actually need it

The use cases that show up on the Professional Cloud Architect exam are the stateful ones. An online shopping cart is the classic example. A customer adds items, the backend stores cart state in memory or a local cache, and every subsequent request needs the same backend to see that cart. Without affinity, the customer hits a different instance and the cart looks empty.

Chat applications follow the same pattern. The backend handling a user's session holds the in-memory state of the conversation, and routing a follow-up message to a different instance breaks continuity and adds latency. Real-time messaging is sensitive to that kind of jitter.

WebSocket connections are a stronger case. A WebSocket is a persistent HTTPS connection that stays open without renegotiation. The load balancer needs to keep that connection pinned to one backend for as long as it lives. Notifications, live streaming, and any bidirectional messaging depend on this.

Gaming workloads round out the list. Multiplayer games hold per-player state on a backend during a match, and consistent routing keeps player interactions synchronized.

The architectural tradeoff

Session affinity is a constraint on the load balancer. The whole point of distributing traffic is to spread load evenly across backends, and affinity reduces that flexibility. If one backend ends up handling a disproportionate number of long-lived sessions, you can get hot spots that defeat the purpose of horizontal scaling.

The right call on the exam, and in practice, is to ask whether the workload genuinely needs server-side session state. If you can move state into a shared store like Memorystore or a database, the application becomes stateless and any backend can serve any request. That is almost always the better answer for new designs. Session affinity is the right tool when you cannot rearchitect the application or when the protocol itself, like WebSocket, demands a persistent connection.

The IoT pattern that shows up alongside it

The other Cloud Load Balancing scenario worth knowing for the Professional Cloud Architect exam is global ingestion of IoT data. Picture a fleet of connected vehicles distributed across the U.S., Germany, and India, all generating telemetry that needs to land somewhere for predictive maintenance.

The architecture uses a global HTTP(S) Load Balancer as the single entry point. Devices send data to one global IP, and the load balancer routes each request to the nearest regional Compute Engine instance group. Vehicles in the U.S. hit instances in us-central1, German vehicles route to europe-west3, and Indian vehicles go to asia-south2. This minimizes latency for each device and keeps regional failures contained.

Once the data is processed at the edge, it flows into the rest of the analytics pipeline. Pub/Sub absorbs the streams, Dataflow handles transformation, and BigQuery stores the result for analysis. The load balancer is the front door, and the regional Compute Engine groups are the local processors that feed a centralized data plane.

This is a clean pattern to recognize on the exam because it combines three ideas at once. Global load balancing for a single ingress point, regional backends for low latency, and a streaming pipeline for downstream processing.

What to remember

Session affinity solves a specific problem: routing the same client back to the same backend when the backend holds session state. HTTP(S) Load Balancers are the right product because they can use application-layer signals to track sessions. Reach for it when the workload is genuinely stateful or when the protocol requires persistent connections, and prefer a stateless architecture with shared state stores when you have the option.

My Professional Cloud Architect course covers session affinity and global load balancing patterns alongside the rest of the networking material.