Service Accounts for the PDE Exam: Managed vs User

619c7c8da6d7b95cf26f6f70

July 28, 2025

Service accounts show up across the Professional Data Engineer exam because almost every data pipeline question involves one component calling another. A Cloud Function reads from BigQuery. A Dataflow job writes to Cloud Storage. An App Engine app triggers another service. None of those calls happen without an identity attached, and that identity is almost always a service account.

The Professional Data Engineer exam expects you to know the difference between the two kinds of service accounts in GCP, recognize them by their email format, and understand which IAM role lets you create and modify them. I want to walk through that in the same order I cover it in my course, because the distinctions are small but the exam likes to test them in scenario questions.

Why service accounts matter for data pipelines

Think about a typical data workflow on GCP. You have an App Engine application that pulls data from BigQuery, processes it, writes results back to BigQuery, and occasionally triggers a Cloud Function to handle a side task. Every one of those hops needs an authenticated identity. You are not sitting at the console typing credentials for each call. You attach a service account to the App Engine app, grant that service account the IAM roles it needs on BigQuery and Cloud Functions, and the calls go through securely without human intervention.

The same logic applies to scheduled and automated work. Cron jobs, batch jobs, CI/CD pipelines, Cloud Composer DAGs, Dataflow jobs that run on a schedule. None of these have a human attached at runtime. Service accounts are how you give them an identity and a permission boundary.

If you keep that mental model in mind, the exam questions about service accounts get easier. The question is always some variation of which identity should this workload use, and what permissions should it have.

Google-managed service accounts

Google-managed service accounts are created automatically by GCP when you enable certain services or spin up certain resources. You did not ask for them. They appeared because the service needs an identity to run.

A few traits to memorize:

Automatically created by GCP when the underlying service is enabled.
Used by GCP services themselves to perform system-level tasks.
Come with default permissions. You can change those, but the account starts out with a set that Google chose.
Names are predetermined. You do not pick them. The format is fixed and varies only by project number or project ID.

Two examples that come up constantly:

The Compute Engine default service account, which looks like <project-number>-compute@developer.gserviceaccount.com. Any VM you launch without specifying a service account inherits this one.
The App Engine default service account, which looks like <project-id>@appspot.gserviceaccount.com. Anything running on App Engine standard uses this unless you override it.

Recognize those email formats on sight. If the exam shows you an email ending in developer.gserviceaccount.com or appspot.gserviceaccount.com, you are looking at a Google-managed account. That is a clue about what the workload is and what permissions it likely has by default.

User-managed service accounts

User-managed service accounts are the ones you create yourself, and they are what you should be reaching for in real production data pipelines.

Their traits:

User-created. They do not exist until you make them.
Permissions customized from the start. You grant only the roles the workload actually needs.
Custom name. You pick something descriptive.
Usually scoped to a very specific task. One service account per workload is the pattern you want.

The email format is my-app-service-account@<project-id>.iam.gserviceaccount.com. The giveaway is the iam.gserviceaccount.com domain instead of developer or appspot.

The reason user-managed accounts are the right choice for production data work is least privilege. The Compute Engine default account, if left untouched, has the Editor role on the project. That is way too much for a Dataflow worker or a Cloud Function that only needs to read one BigQuery dataset and write to one Cloud Storage bucket. When the exam asks you how to follow least privilege for a pipeline, the answer is almost always create a dedicated user-managed service account with only the roles required.

The Service Account Admin role

To create and modify user-managed service accounts you need the Service Account Admin role. This is a specific role and the exam likes to test that you know it.

Service Account Admin lets you:

Create new user-managed service accounts.
Assign IAM permissions to those service accounts.
Delete or disable service accounts when a workload is decommissioned.

Note what it does not do on its own. Service Account Admin lets you manage the lifecycle of service accounts, but granting roles to a service account on other resources, like a BigQuery dataset or a Cloud Storage bucket, requires the appropriate admin role on those resources too. The exam sometimes wraps a scenario question around exactly this distinction.

What to take into the exam

If you remember three things from this article going into the Professional Data Engineer exam, make it these:

Google-managed accounts end in developer.gserviceaccount.com or appspot.gserviceaccount.com, come with default permissions, and have names you cannot choose.
User-managed accounts end in iam.gserviceaccount.com, start with zero permissions, and are the right tool for least-privilege pipeline design.
Creating and modifying user-managed service accounts requires the Service Account Admin role.

My Professional Data Engineer course covers IAM, service accounts, and the rest of the security and identity material the exam draws from.

Service Accounts for the PDE Exam: Google-Managed vs User-Managed

Why service accounts matter for data pipelines

Google-managed service accounts

User-managed service accounts

The Service Account Admin role

What to take into the exam

Get tips and updates from GCP Study Hub