
Service accounts show up across the Professional Data Engineer exam because almost every data pipeline question involves one component calling another. A Cloud Function reads from BigQuery. A Dataflow job writes to Cloud Storage. An App Engine app triggers another service. None of those calls happen without an identity attached, and that identity is almost always a service account.
The Professional Data Engineer exam expects you to know the difference between the two kinds of service accounts in GCP, recognize them by their email format, and understand which IAM role lets you create and modify them. I want to walk through that in the same order I cover it in my course, because the distinctions are small but the exam likes to test them in scenario questions.
Think about a typical data workflow on GCP. You have an App Engine application that pulls data from BigQuery, processes it, writes results back to BigQuery, and occasionally triggers a Cloud Function to handle a side task. Every one of those hops needs an authenticated identity. You are not sitting at the console typing credentials for each call. You attach a service account to the App Engine app, grant that service account the IAM roles it needs on BigQuery and Cloud Functions, and the calls go through securely without human intervention.
The same logic applies to scheduled and automated work. Cron jobs, batch jobs, CI/CD pipelines, Cloud Composer DAGs, Dataflow jobs that run on a schedule. None of these have a human attached at runtime. Service accounts are how you give them an identity and a permission boundary.
If you keep that mental model in mind, the exam questions about service accounts get easier. The question is always some variation of which identity should this workload use, and what permissions should it have.
Google-managed service accounts are created automatically by GCP when you enable certain services or spin up certain resources. You did not ask for them. They appeared because the service needs an identity to run.
A few traits to memorize:
Two examples that come up constantly:
Recognize those email formats on sight. If the exam shows you an email ending in developer.gserviceaccount.com or appspot.gserviceaccount.com, you are looking at a Google-managed account. That is a clue about what the workload is and what permissions it likely has by default.
User-managed service accounts are the ones you create yourself, and they are what you should be reaching for in real production data pipelines.
Their traits:
The email format is my-app-service-account@<project-id>.iam.gserviceaccount.com. The giveaway is the iam.gserviceaccount.com domain instead of developer or appspot.
The reason user-managed accounts are the right choice for production data work is least privilege. The Compute Engine default account, if left untouched, has the Editor role on the project. That is way too much for a Dataflow worker or a Cloud Function that only needs to read one BigQuery dataset and write to one Cloud Storage bucket. When the exam asks you how to follow least privilege for a pipeline, the answer is almost always create a dedicated user-managed service account with only the roles required.
To create and modify user-managed service accounts you need the Service Account Admin role. This is a specific role and the exam likes to test that you know it.
Service Account Admin lets you:
Note what it does not do on its own. Service Account Admin lets you manage the lifecycle of service accounts, but granting roles to a service account on other resources, like a BigQuery dataset or a Cloud Storage bucket, requires the appropriate admin role on those resources too. The exam sometimes wraps a scenario question around exactly this distinction.
If you remember three things from this article going into the Professional Data Engineer exam, make it these:
My Professional Data Engineer course covers IAM, service accounts, and the rest of the security and identity material the exam draws from.