
Cloud Storage is rarely the end of the story in a data pipeline. An object lands in a bucket, and almost always something downstream needs to react to it. A Cloud Function should fire, a Pub/Sub topic should get a message, or a Composer DAG should pick up the new data and run a job. The Professional Data Engineer exam tests whether you understand how that reaction is wired together, and the answer comes down to Cloud Storage notifications.
I want to walk through how these notifications work, what they can trigger, and the specific patterns I see show up in exam scenarios. If you understand the moving parts here, a whole category of pipeline-design questions becomes straightforward.
When an object is uploaded to a bucket or changed in some way, Cloud Storage can be configured to generate a notification. That notification is the signal that something happened inside the bucket. On its own it does nothing useful. The value comes from wiring that signal into another GCP service that runs your logic in response.
The four trigger events you need to know are:
Object creation is the most common case in real workloads and on the exam, but if you see a question about reacting to a deletion or a metadata change, the same mechanism applies. It is one feature with multiple event types, not several different features.
From a single object upload, Cloud Storage can directly trigger:
These three destinations cover almost every exam scenario. If the question describes lightweight processing on each file, like compressing an image, converting a format, or running a validation script on a small dataset, Cloud Functions is the natural answer. If the workflow needs to fan out to multiple consumers, or you want to decouple producers and consumers, route the notification through Pub/Sub. If the work is heavier or already lives in a container, Cloud Run is the better fit.
One specific pattern shows up often enough that it is worth memorizing. You have a data pipeline orchestrated by Cloud Composer, and you want a DAG to run as soon as a new file lands in a bucket. The chain looks like this:
The reason Cloud Function sits in the middle is that Cloud Storage cannot call the Airflow REST API directly. The function is the glue. It receives the event, extracts the object metadata it needs, and makes the authenticated API call to Composer. From there the DAG owns the rest of the workflow.
If the exam describes a scenario where files arrive irregularly and you need a Composer DAG to run each time, this is the architecture to reach for. Polling on a schedule is the wrong answer because it either runs too often and wastes resources or runs too rarely and adds latency.
Notifications are the event-driven story, but Cloud Storage also shows up as a static input or output in pipelines that other services orchestrate. Three integrations come up:
The distinction worth holding in your head is that these three are batch integrations. Cloud Storage is providing data to a job that runs on a schedule or on demand. Notifications, by contrast, are the event-driven layer that says something just happened and lets a downstream service react in near real time.
When you see a Professional Data Engineer question that mentions a file landing in Cloud Storage and a downstream system needing to run, the candidate answers usually narrow to three categories. Either the question wants you to identify the notification + Cloud Function path, the notification + Pub/Sub path, or the notification + Cloud Function + Composer path for orchestrated pipelines. Reading the question carefully for clues about scale, fan-out, and existing infrastructure usually tells you which one.
If you can sketch the diagram from upload to downstream action without checking, you are in good shape for this section.
My Professional Data Engineer course covers Cloud Storage notifications, pipeline trigger patterns, and the rest of the GCS integrations you need for the exam.