Pub/Sub Push vs Pull Subscriptions for the PDE Exam

GCP Study Hub
619c7c8da6d7b95cf26f6f70
August 27, 2025

Pub/Sub is the messaging backbone for almost every streaming data pipeline you will build on Google Cloud, and the Professional Data Engineer exam loves to test whether you understand how subscribers actually receive messages. Every Pub/Sub topic needs at least one subscription, and when you create that subscription you pick a delivery model: push or pull. The two models look similar on paper but behave very differently in production, and the exam will hand you a scenario where only one of them fits.

I want to walk through how I think about the two, what each one is actually doing under the hood, and the cues in an exam question that point you toward one answer over the other.

Pull subscriptions: the subscriber drives

In a pull subscription, the subscriber initiates the request. Your application, Dataflow job, or worker process explicitly asks Pub/Sub for messages, processes them, and then sends back an acknowledgment. Pub/Sub is sitting there holding messages and waiting to be asked.

This model is a good fit when you are processing a large volume of messages and you want some control over the pace of consumption. Batch delivery works well here because the subscriber can grab many messages in a single request, chew through them, and ack them as a group. If a downstream system is slow or temporarily down, your pull subscriber can simply slow down its requests without losing data. Pub/Sub holds onto unacked messages until the retention window expires.

The tradeoff is code complexity. With pull, the subscriber owns the logic for requesting messages, handling acks, managing flow control, retrying on failure, and scaling horizontally. You usually use a client library that handles much of this for you, but it is still more work than handing Pub/Sub a URL and walking away.

Push subscriptions: Pub/Sub drives

In a push subscription, Pub/Sub flips the direction. You give Pub/Sub a webhook URL, and Pub/Sub sends each message to that endpoint as an HTTPS POST request. The subscriber does not have to ask. It just has to be ready to receive.

The subscriber endpoint must be a webhook that accepts POST over HTTPS. That is a hard requirement. Cloud Run, Cloud Functions, App Engine, and any HTTPS endpoint you control all qualify. The endpoint returns a success status code to ack the message, or a failure code to make Pub/Sub retry.

Push subscriptions are the right choice for low-latency, real-time streaming scenarios where you want messages delivered the instant they arrive at the topic. There is no polling delay because Pub/Sub initiates delivery immediately. You also write less code, because most of the heavy lifting around delivery is handled for you.

The catch is that your endpoint has to keep up. If your webhook is slow or returns errors, Pub/Sub will retry with exponential backoff, which can pile up traffic. Push also requires that the endpoint be publicly reachable over HTTPS, which has implications for authentication and network design.

How exam questions frame this choice

The Professional Data Engineer exam tends to give you a scenario and ask which subscription type fits best. The signal words I watch for:

  • Real-time, low latency, immediate, event-driven, Cloud Run handler, Cloud Functions trigger: these point to push.
  • Batch, high throughput, large volume, flow control, Dataflow pipeline, backpressure, on-premises subscriber: these point to pull.
  • Subscriber cannot accept inbound HTTPS or subscriber is behind a firewall: definitely pull, because push requires a reachable webhook.
  • Subscriber must be a webhook: this is push by definition.

A common trap is the assumption that push is always better because it is real-time. It is not. If the question describes a Dataflow streaming job consuming from Pub/Sub, that is a pull subscription, even though Dataflow processes messages in near real time. Dataflow uses the Pub/Sub I/O connector which pulls under the hood. The exam expects you to know that.

A few other things worth keeping in mind

Both subscription types support exactly-once delivery as a configurable option, ordering keys for in-order delivery within a key, and dead-letter topics for messages that fail repeatedly. The delivery mode does not change any of that.

You can also switch a subscription between push and pull after it is created, which is useful if your architecture evolves. You cannot, however, have a single subscription be both at the same time. If two different downstream systems need the same messages with different delivery modes, you create two subscriptions on the same topic.

The mental shortcut I use is: who initiates? If Pub/Sub initiates, it is push. If the subscriber initiates, it is pull. Everything else follows from that.

My Professional Data Engineer course covers Pub/Sub subscriptions, delivery semantics, and how to pair them with Dataflow and downstream sinks in the streaming and messaging modules.

Get tips and updates from GCP Study Hub

arrow