Google Cloud SDK and gcloud CLI for the PDE Exam

619c7c8da6d7b95cf26f6f70

July 14, 2025

When I talk to people studying for the Professional Data Engineer exam, the gcloud CLI is one of those topics that feels almost too basic to spend time on. It is not a flashy data service like BigQuery or Dataflow, and it does not have a clever distributed-systems story behind it. But Google still puts gcloud questions on the exam, and they tend to be the kind where one wrong flag or one missing setup step is the difference between a correct answer and a near-miss distractor. So it is worth getting the fundamentals locked in.

In this article I want to walk through what the Google Cloud SDK actually contains, what gcloud does, how configurations work, and the handful of commands that are most likely to show up on a Professional Data Engineer question.

What is inside the Google Cloud SDK

The Google Cloud SDK is a bundle, not a single tool. When you install it, you get a few things together:

gcloud is the main command line interface. This is what you use to manage projects, compute resources, IAM, networking, and most other Google Cloud services.
bq is the command line tool for interacting with BigQuery. Queries, dataset and table management, loading data, exporting data, all of it.
gsutil is the command line tool for Cloud Storage. Copying objects, managing buckets, setting lifecycle rules.
Client libraries let you do the same things programmatically from your own applications in Python, Java, Node.js, Go, and a few others.

On exam questions, the trick is often choosing the right tool from this list. If a scenario asks how to script a BigQuery load job, the answer is usually bq load, not gcloud. If the scenario is about copying a large dataset between buckets, the answer is gsutil. And if the workflow needs to run inside an application that already uses Python, the right answer is a client library rather than shelling out to gcloud.

What gcloud can actually do

gcloud can do almost anything the Cloud Console can do, and a few things the console cannot. A handful of examples that are useful to recognize on the exam:

gcloud app deploy deploys an application to App Engine.
gcloud projects create my-project-id --name="My Project" creates a new project.
gcloud compute instances list lists every Compute Engine VM in the current project.
gcloud logging read "resource.type=gce_instance AND severity>=ERROR" --limit=10 pulls the 10 most recent error-level log entries for Compute Engine.

That last one is the flavor of command Google likes to test. You do not need to memorize the entire syntax, but you should be able to look at a gcloud command and know roughly what surface area it touches.

Installing the SDK

Installation is straightforward. You can grab the platform-specific package from cloud.google.com/sdk/docs/install, or use your operating system package manager. On Debian based Linux that means apt-get, and on Mac it means Homebrew. I usually go with Homebrew on Mac because it keeps the SDK in line with my other tools and updates are a single command.

The two steps before gcloud will do anything

Most functional gcloud commands need two things set up before they will run. This is one of the most testable corners of the topic because it shows up in scenario questions about why a script or a CI job is failing.

Step 1 is authentication. Run gcloud auth login and it opens a browser to log in with your Google account. Without this, gcloud has no credentials and most commands will refuse to execute.

Step 2 is setting the default project. Run gcloud config set project [PROJECT_ID] with the actual project ID. From then on, every command runs in the context of that project unless you explicitly override it with a --project flag.

If an exam question shows a gcloud command failing with an authentication or permission error, the answer is almost always one of these two steps missing, or the wrong account being active.

Configurations

gcloud configurations are named bundles of settings. They are useful when you work across multiple projects or environments, because you can swap them in and out instead of editing properties one at a time.

Creating one looks like this:

gcloud config configurations create prod
gcloud config set project my-prod-project
gcloud config set compute/zone us-central1-a
gcloud config set compute/region us-central1

Then later, to switch to that configuration:

gcloud config configurations activate prod

You might have a prod configuration, a dev configuration, and a personal sandbox configuration, and switch between them with one command. The Professional Data Engineer exam will sometimes phrase a question around switching between environments cleanly, and configurations are the right answer when that comes up.

Other commands worth recognizing

A few more commands show up often enough that I always make sure people studying recognize them:

gcloud init walks you through initial setup, including auth and default project and zone. It is what you typically run first on a new machine.
gcloud projects list shows every project tied to your account.
gcloud services list shows the APIs enabled in the current project. Useful when a job is failing because an API is not enabled.
gcloud projects describe [PROJECT_ID] returns metadata like billing account, project number, and labels.
gcloud compute instances list gives you a quick view of VMs in the project.

None of this is the deep substance of the Professional Data Engineer exam, but it is the connective tissue that lets you talk about every other topic. If you cannot get authenticated, set a project, and run a command, none of the BigQuery or Dataflow knowledge matters.

My Professional Data Engineer course covers the Google Cloud SDK, gcloud configurations, and the bq and gsutil tools you will need across the data pipeline topics.