
One of the first conceptual frameworks I drill into every Professional Data Engineer candidate is the spectrum of management levels on Google Cloud. The exam will not ask you a question titled "is this service managed or serverless," but it will absolutely ask you to pick the right service for a workload where the deciding factor is how much operational responsibility your team wants to carry. If you do not have a clean mental model for unmanaged, managed, and serverless, you will second-guess yourself on the easy questions and lose time you needed for the hard ones.
So let me lay out the framework the way I teach it, and then connect it to the kinds of decisions the PDE exam expects you to make.
Every GCP service sits somewhere on a gradient. On one end you have full responsibility for the machine. On the other end you write code and Google handles literally everything else. The question is not "which end is better." It is "which end fits this workload." Some teams need deep customization. Other teams want zero operational overhead so they can focus on shipping data pipelines.
Google groups this into three buckets: unmanaged, managed, and serverless (also called no-ops). Each bucket trades control for convenience in a different ratio.
With an unmanaged service, Google provides the raw infrastructure and you manage everything on top of it. The canonical example is Compute Engine, which is GCP's infrastructure-as-a-service offering. You get a virtual machine. After that, you are responsible for:
The upside is maximum customizability. If you need a specific kernel version, a niche database engine, or a licensed piece of software that only runs on a particular Linux distro, unmanaged is where you go. The downside is exactly what you would expect. You own the operational overhead. For a data engineering team, that overhead usually only makes sense when you have a hard technical or licensing constraint that the managed services cannot satisfy.
With a managed service, GCP takes over the underlying infrastructure responsibilities. Google handles server setup, software installation, and OS maintenance. You still control how your application is configured and deployed, but you stop thinking about the box it runs on. The analogy I use is that Google builds and maintains the engine, and you decide how to drive the car.
Good examples of managed (but not serverless) services are:
The key thing to internalize is that most services on GCP are at least managed. Managed is the baseline, not the premium tier. When you see a service name on the exam, your default assumption should be that Google is handling the infrastructure layer.
Serverless services raise the abstraction one level higher. GCP automatically manages infrastructure and servers, and that includes scaling. You do not provision capacity. You do not pick instance types. You write code or define a pipeline and Google runs it.
The serverless services that show up constantly in data engineering scenarios are:
Here is the rule that catches a lot of candidates off guard: all serverless services are managed, but not all managed services are serverless. Cloud Bigtable is managed. It is not serverless, because you still provision nodes. Dataflow is managed and serverless, because you submit a job and walk away. Knowing which bucket a service falls into is exactly the kind of distinction the Professional Data Engineer exam tests, often indirectly, by describing a scenario where the team "wants to minimize operational overhead" or "does not want to manage cluster capacity."
When you read a Professional Data Engineer scenario, the management level is usually hiding in two or three phrases. Watch for signals like:
The other thing the exam will do is mix services across all three levels in the answer choices. You might see a question where the right answer is Dataflow and a distractor is Dataproc on Compute Engine. Both can process data. The deciding factor is which management level fits the team described. If the scenario says the team wants no infrastructure to manage, Dataproc on Compute Engine is wrong even though it would technically work.
Practice reading scenarios for those signals before you read the answer choices. It saves time and keeps you from being seduced by a service you happen to know well.
My Professional Data Engineer course covers the management-level framework in depth and walks through how to apply it across every major data service on the exam, including the tricky cases where Dataflow, Dataproc, and BigQuery overlap.