Monitoring Redis on Memorystore for the PDE Exam

619c7c8da6d7b95cf26f6f70

April 27, 2026

Memorystore for Redis questions on the Professional Data Engineer exam often shift away from "how do I provision a cache?" and toward "how do I know my cache is healthy, and how do I know when it's time to scale?" That second question is where Cloud Monitoring comes in, and it's the part most candidates underprepare for. In this article I want to walk through the Redis metrics that matter on Memorystore, the alerting policies you should be able to describe on exam day, and the scaling decision that almost always comes up when memory pressure rises.

Why Memorystore monitoring shows up on the PDE exam

Memorystore for Redis is a fully managed cache, but "managed" does not mean "hands off." The Professional Data Engineer exam expects you to treat the cache like any other production data system. That means you need observability into how Redis is performing, you need alerting policies wired into Cloud Monitoring, and you need a clear rule for when the instance has outgrown its current tier or size. The exam tends to test this through scenario questions where an application's tail latency is climbing or eviction rates are spiking, and the right answer almost always involves a specific metric, not a vibe.

Memorystore publishes Redis metrics directly to Cloud Monitoring under the redis.googleapis.com namespace, so you do not have to install any agent or sidecar. The metrics are there as soon as the instance is provisioned. Your job is to know which ones to watch.

The core Redis metrics to know

There are a handful of metrics I expect every Professional Data Engineer candidate to recognize. These are the ones I would memorize before exam day.

System Memory Usage Ratio. This is the headline metric. It reports the percentage of total Redis memory in use. When this number climbs, you are running out of room to cache new keys, and Redis will start evicting older entries or rejecting writes depending on your maxmemory-policy. Google's guidance is to alert at 80% and treat that as the trigger to upgrade.
Cache hit ratio. A cache that is not being hit is just a costly side service. Hit ratio tells you how often reads are being served from Redis versus falling through to the source of truth. A falling hit ratio under steady traffic usually means evictions are kicking in or the working set has outgrown the instance.
Evicted keys. This counter increments when Redis has to drop keys to make room. A small steady eviction rate can be normal with an LRU policy, but a sharp rise usually means the cache is undersized for its workload.
Connection count. Tracks the number of clients connected to the instance. Spikes here can indicate connection leaks in application code, or a fan-out pattern that is pushing toward the per-tier connection limit.
Latency and CPU. CPU utilization and command latency both surface saturation. If CPU is pinned and commands are queueing, no amount of memory headroom will save you. You scale tier or size in that case.

Setting an 80% memory alert in Cloud Monitoring

The single most testable point on this topic is the alerting policy on System Memory Usage Ratio. The pattern is straightforward. You create an alerting policy in Cloud Monitoring, point it at the Memorystore Redis metric, and configure the condition to fire when the ratio is greater than or equal to 0.8 for a sustained window. From there you wire it to a notification channel so your on-call rotation actually hears about it.

If you are setting this up via the gcloud CLI, the shape of the command looks like this.

gcloud alpha monitoring policies create \
  --notification-channels=projects/PROJECT_ID/notificationChannels/CHANNEL_ID \
  --display-name="Redis Memory Usage 80%" \
  --condition-display-name="System Memory Usage Ratio >= 0.8" \
  --condition-filter='metric.type="redis.googleapis.com/stats/memory/system_memory_usage_ratio" resource.type="redis_instance"' \
  --condition-threshold-value=0.8 \
  --condition-threshold-duration=300s \
  --condition-threshold-comparison=COMPARISON_GT

The five minute duration matters. You do not want a single momentary spike to wake up your on-call. You want a sustained signal that pressure is building.

What to do when the alert fires

This is where the Professional Data Engineer exam tries to trip you up. When System Memory Usage Ratio crosses 80%, Google's recommendation is to upgrade. That can mean either of two things.

Resize the instance to a larger memory footprint within the same tier. If you are on the Standard tier and at 13 GB, moving to 20 or 26 GB gives you headroom without changing the topology.
Upgrade to a higher service tier. Moving from Basic to Standard tier gives you a replica for failover, and Standard tier instances scale further on memory than Basic. If you are already on Standard and bumping ceilings, a tier change to Standard with read replicas adds horizontal read capacity.

The wrong answers on exam questions tend to be "flush the cache," "lower the maxmemory setting," or "add more application instances." None of those address the underlying capacity problem. The right answer is almost always to grow the instance.

Scaling triggers beyond memory

Memory is the primary scaling trigger, but it is not the only one. If eviction count climbs while memory ratio sits well below 80%, that usually means your maxmemory-policy is too aggressive or your TTLs are misconfigured rather than a sizing problem. If CPU saturates before memory does, you may benefit from a higher tier with more vCPU per shard rather than just more memory. And if connection count is hitting the per-tier ceiling, the fix is application side connection pooling, not a bigger Redis box.

For exam scenarios, map the symptom to the metric, and map the metric to the action. Memory ratio high means upgrade size or tier. Eviction high with memory low means tune the policy. Latency high with CPU pinned means scale tier. Connection count near ceiling means pool on the client.

Putting it together for the exam

If you remember three things from this article, make them these. First, the System Memory Usage Ratio metric in Cloud Monitoring is the canonical signal for Memorystore Redis health. Second, the 80% threshold with a sustained alert is the textbook configuration Google expects you to know. Third, the response to that alert is to upgrade the instance to a larger size or higher tier, not to patch around it.

My Professional Data Engineer course covers Memorystore monitoring alongside the rest of the operational and observability topics on the exam, with worked examples for the alerting policies you are most likely to see in scenario questions.