Cloud SQL Replication Lag and Sharding for the PCA Exam

Ben Makansi

April 6, 2026

Replication lag is one of those topics that comes up in Professional Cloud Architect exam questions about Cloud SQL performance, and it pairs naturally with sharding because sharding is a common way to reduce it. I want to walk through both concepts the way they actually show up on the exam.

What replication lag is

Replication lag is the delay between a change on the primary database and when that change shows up on its read replicas. Replication is not instantaneous. Updates have to be transmitted from the primary to each replica and then applied, and that takes time.

Some lag is always present. It becomes a real problem in write-heavy workloads. Ecommerce checkout is the canonical example. A user submits an order, the write hits the primary, and then a follow-up read for order confirmation goes to a read replica that has not yet received the update. The user sees stale data and the application looks broken.

The standard Cloud SQL pattern that creates this risk is the one where the application sends all writes to the primary and distributes reads across read replicas to spread load. That distribution is exactly what makes replicas useful, but it also exposes the lag window.

How to manage replication lag

There are a few levers, and the Professional Cloud Architect exam expects you to recognize them in scenario questions:

Tune replication settings on the Cloud SQL instance.
Use a stronger consistency model when the workload demands it.
Route reads that need up-to-date data to the primary instead of a replica.

That last one is the most practical and the most commonly correct answer on the exam. If a read absolutely needs the latest write, send it to the primary. Use replicas only for reads where slightly stale data is acceptable.

Sharding to reduce replication lag

The other strategy you need to know is database sharding. Sharding partitions the data on the primary into smaller independent pieces called shards, and each shard can live on its own server.

The reason this reduces replication lag is mechanical. Instead of one primary handling every write for the entire dataset, writes are distributed across multiple shards. Each individual database server is now processing fewer updates, so the volume of changes that need to propagate from any one shard to its replicas drops. Less write contention per server means replication catches up faster.

Architecturally, the picture changes from one primary fanning out to read replicas, to multiple shards each fanning out to their own replicas. The application now routes writes based on which shard owns the relevant data, and reads can still go to the appropriate shard's replicas with a smaller delay.

The trade-off the exam wants you to recognize

Sharding is not free. It adds complexity around how data is distributed across shards and how the application routes queries to the right shard. You need a sharding key, and queries that span multiple shards become harder.

The Professional Cloud Architect exam tends to test this as a trade-off. If a question describes a write-heavy workload struggling with replication lag and asks for the architectural fix, sharding is the answer. If a question is asking which approach minimizes operational complexity, sharding is usually the wrong answer and the right one is something like routing critical reads to the primary or moving to a managed service that handles distribution natively.

The mental model I use: replication lag is about the gap between primary and replica, and sharding shrinks that gap by spreading writes across more primaries. Routing critical reads to the primary side-steps the gap entirely for the reads that matter most.

My Professional Cloud Architect course covers Cloud SQL replication lag and sharding alongside the rest of the databases material.