IaaS, PaaS, SaaS and GCP Service Models for the PDE Exam

GCP Study Hub
619c7c8da6d7b95cf26f6f70
May 30, 2025

One of the foundational mental models I rely on when teaching the Professional Data Engineer exam is the cloud service model spectrum: IaaS, PaaS, and SaaS. The exam itself does not ask you to define these acronyms in a vacuum, but it constantly asks you to pick the right Google Cloud service for a given workload, and that choice is almost always about how much infrastructure you want to manage versus how much you want Google to manage for you. Get the spectrum right in your head and a surprising number of scenario questions become easier.

The three service models

Infrastructure as a Service (IaaS) gives you virtualized computing resources online. You rent the servers, storage, and networking, and you keep full control of the operating system, the runtime, and everything you install on top. Google Compute Engine is the canonical GCP example. Amazon EC2 and Azure VMs are the equivalents on the other clouds. IaaS is the right fit when you need a custom environment, a specific kernel version, a licensed piece of software with strict install requirements, or a lift-and-shift of an on-prem workload that is not ready to be refactored.

Platform as a Service (PaaS) hands you a managed platform for building, running, and managing applications. The operating system, runtime, scaling, and patching are Google's problem, and you bring the code. Google App Engine, Cloud Run, and Cloud Functions all sit in this space. AWS Elastic Beanstalk and Azure App Services are the analogues. PaaS is the right fit when you want to ship application logic without thinking about VMs, and when your workload fits the shape the platform expects (stateless web apps, event-driven functions, containerized services).

Software as a Service (SaaS) is the finished product. You log in and use the application. Google Workspace, Salesforce, and Microsoft 365 are the classic examples. There is no infrastructure, no platform, no code. From a data engineering perspective, SaaS usually shows up as a source system you are pulling data out of, not something you are building on top of.

The arrow that matters: as you move from IaaS to PaaS to SaaS, abstraction goes up and management burden goes down. You give up some control in exchange for someone else doing the operational work. That tradeoff is the entire point.

Where GCP data services land on the spectrum

Most of the GCP services that show up on the Professional Data Engineer exam are either IaaS or PaaS, with PaaS being the heavy majority. Here is how I bucket the ones you will see most often:

  • Compute Engine is IaaS. You manage the VM, the OS, and what runs on it.
  • Kubernetes Engine (GKE) is a hybrid. The control plane is managed for you, but you still think about nodes, pools, and cluster configuration. Autopilot mode pushes it further toward PaaS.
  • App Engine, Cloud Run, and Cloud Functions are PaaS. You ship code or a container and Google handles the rest.
  • BigQuery is PaaS, often called serverless. There are no clusters to size for queries (slots are a capacity abstraction, not a server). You write SQL and Google runs it.
  • Dataflow is PaaS. You write an Apache Beam pipeline and Google manages the workers, autoscaling, and shuffle.
  • Dataproc sits closer to IaaS. It is managed Hadoop and Spark, but you still pick machine types, cluster size, and lifecycle. Dataproc Serverless moves it toward PaaS.
  • Pub/Sub is PaaS. There are no brokers to manage.
  • Cloud Storage, Spanner, Bigtable, and Cloud SQL are all managed storage and database services. Cloud SQL is the most VM-shaped of the group because you still pick an instance size and version. Spanner and Bigtable are more abstracted.
  • Composer is managed Apache Airflow. It is PaaS, but you still think about environment sizing.
  • Data Fusion is a managed visual ETL service built on CDAP. PaaS.
  • Vertex AI is a suite of PaaS services for training, tuning, and serving models.
  • Cloud IAM and Cloud Build are supporting PaaS services that you will interact with constantly.

How this shows up on the Professional Data Engineer exam

The exam rarely asks you to label a service IaaS or PaaS directly. What it does ask, over and over, is which service is the right choice given a set of constraints. The service-model lens is the fastest way to narrow the answers.

If the scenario emphasizes operational simplicity, no ops team, serverless, autoscaling, or low maintenance, you should lean toward the most managed option that fits. That usually means BigQuery over Dataproc, Cloud Run over Compute Engine, Pub/Sub over a self-managed Kafka cluster, Spanner or Bigtable over Cloud SQL on a VM. If the scenario emphasizes full control, a specific software stack, an existing Hadoop or Spark codebase, or a lift-and-shift, the answer leans toward IaaS or the less-abstracted PaaS option, which often means Compute Engine or Dataproc.

Watch for distractors that throw in an IaaS option when the scenario clearly calls for serverless, and the other way around. The exam loves to test whether you can resist the urge to pick the service you know best and instead pick the one that matches the constraints.

One more thing worth internalizing: not every GCP service appears on the Professional Data Engineer exam. The blueprint focuses on data ingestion, storage, processing, analytics, ML, and the security and orchestration glue around those. When you see a service that does not fit that scope in an answer choice, that is usually a signal it is a distractor.

My Professional Data Engineer course walks through each of the data services above at exam depth, including when to pick Dataflow over Dataproc, BigQuery over Bigtable, and Cloud Run over App Engine in the scenarios the exam actually asks about.

Get tips and updates from GCP Study Hub

arrow