Data Fusion Studio for the PDE Exam: UI and Menu

GCP Study Hub
619c7c8da6d7b95cf26f6f70
March 13, 2026

If you're preparing for the Professional Data Engineer exam, Cloud Data Fusion is one of those services where the questions usually aren't about syntax or commands. They're about what the tool looks like, what you build inside it, and how the pieces fit together. Studio is the part of Data Fusion you actually spend time in, so getting a clear mental picture of its interface and menu pays off on exam day.

I'll walk through what Studio is, how you navigate it, and what shows up in the plugin palette when you start designing a pipeline.

What Studio Is

Studio is the visual pipeline builder inside Cloud Data Fusion. You open it from the Integrate icon on the Data Fusion main console. It used to be called Pipeline Studio, which is honestly a more descriptive name because pipelines are the main thing you build there. Google renamed it to just Studio, but if you see older references to Pipeline Studio in documentation or practice questions, they're talking about the same surface.

The whole point of Studio is that you don't write code to build an ETL or ELT pipeline. You drag plugins onto a canvas, connect them with arrows, and configure each plugin through a properties panel. It's a drag-and-drop interface, full stop. For the Professional Data Engineer exam, if a question describes a no-code or low-code visual pipeline builder on GCP that runs on Dataproc under the hood, Data Fusion Studio is almost always the answer.

The Main Data Fusion Navigation

Before you get to Studio itself, Data Fusion's main console has a left-hand navigation with a few key destinations. Knowing what each one does keeps you from confusing them in scenario questions.

  • Studio (Pipeline Studio) reached through the Integrate icon. This is where you build pipelines visually.
  • Hub is the catalog of reusable assets. Prebuilt pipelines, plugins, drivers, and sample use cases live here. You pull things from the Hub into Studio rather than reinventing them.
  • Wrangler is the interactive data preparation interface. You point it at a sample of your data, click through transformations, and Wrangler generates the directives that become a Transform step inside a pipeline. Wrangler is how non-engineers do data prep without writing Spark.
  • Replication is the dedicated surface for change data capture jobs. If you're moving rows out of a relational database into BigQuery on an ongoing basis, you set that up under Replication rather than building a custom pipeline in Studio.
  • Metadata tracks lineage and the technical and operational metadata Data Fusion captures as pipelines run. You go here when you need to answer questions about where a field came from or which pipelines touched a given dataset.

On the exam, expect at least one question that maps a described workflow to the right surface. Interactive data prep means Wrangler. Database CDC means Replication. Lineage tracking means Metadata. A drag-and-drop ETL builder means Studio.

The Pipeline Canvas

Inside Studio, the screen is mostly taken up by the pipeline canvas. The canvas is where you arrange your pipeline graph. Each node is a plugin and each arrow is a connection that carries records from one stage to the next.

On the left of the canvas you get the plugin palette, which is grouped into categories. On the right you get a configuration panel that opens when you click a node. Along the top there's a toolbar with the controls you use most often, like preview, deploy, save, and the run history.

You don't have to memorize every button. The exam is more interested in the shape of the workflow: select plugins from the palette, drop them on the canvas, connect them, configure properties, validate, deploy, run.

The Plugin Palette: Sources, Transforms, Sinks, and More

The plugin palette is the menu on the left side of Studio. It's organized into categories that mirror the structure of a typical pipeline. Knowing these categories is the most testable part of the whole interface.

  • Source plugins are where data enters the pipeline. The list includes BigQuery, Cloud SQL, GCS, Dataplex, Pub/Sub, Spanner, and a long tail of relational databases and SaaS connectors.
  • Transform plugins reshape records as they flow through. Encoding, decoding, parsing, projection, field renaming, JavaScript-based transforms, and the Wrangler plugin that runs the directives you built interactively all live here.
  • Analytics plugins do the heavier shape-changing operations. Group By, Joiner, Deduplicate, Distinct, and Row Denormalizer are the ones to recognize. These are the operations that typically map to GROUP BY, JOIN, and DISTINCT in SQL but run on Dataproc under the hood.
  • Sink plugins are where records land. BigQuery is the most common, but you also see GCS, Cloud SQL, Dataplex, Spanner, Bigtable, and Pub/Sub. Sources and sinks largely mirror each other because most GCP storage services can play either role.
  • Conditions and Actions are the orchestration primitives inside a pipeline. A GCS file move, a BigQuery SQL execution as a side effect, or a conditional branch based on a previous stage's outcome all come from this category. This is how a Data Fusion pipeline does light workflow orchestration without needing Composer.
  • Error Handlers catch failed records and route them somewhere useful, like a dead-letter GCS bucket or an alerting Pub/Sub topic. You attach an error handler to a stage that can fail in a recoverable way.

If a Professional Data Engineer question describes a pipeline that reads from a Cloud SQL table, joins it to a BigQuery dimension, deduplicates, and writes to GCS with a failure alert, you should be able to picture exactly which plugin categories supply each piece.

How You Actually Design a Pipeline

The flow inside Studio is straightforward once you've seen it. You drag a source plugin onto the canvas, click into it and configure the connection. You drag transform or analytics plugins next, connect the source's output arrow to each transform's input, and configure properties. You finish with one or more sinks, optionally wire in error handlers, and use Preview to run a small sample through the pipeline before deploying. Once deployed, the pipeline runs on an ephemeral Dataproc cluster that Data Fusion spins up behind the scenes.

That ephemeral Dataproc execution detail is worth committing to memory. Data Fusion pipelines aren't running on some special Data Fusion runtime. They compile down to Spark or MapReduce jobs on Dataproc, which is why pricing and performance questions on the exam often come back to Dataproc cluster sizing.

My Professional Data Engineer course covers Data Fusion's Studio, Hub, Wrangler, Replication, and Metadata surfaces alongside every other ingestion and orchestration service you'll see on the exam, so when a scenario question asks you to choose between Data Fusion, Dataflow, Dataproc, and Composer, you can map the described workflow to the right tool quickly.

Get tips and updates from GCP Study Hub

arrow