Importing Data Into BigQuery for the PDE Exam

619c7c8da6d7b95cf26f6f70

December 7, 2025

Importing data into BigQuery is one of those topics that looks simple on the surface and then turns into a cluster of small decisions on the Professional Data Engineer exam. The console makes it feel like a single button, but in practice there are at least five distinct paths into BigQuery, and the exam wants you to pick the right one based on the source, the cadence, and the consistency guarantees you need. I want to walk through each option the way it tends to come up on the test.

The Web UI: useful for one-off imports

When you create a new table in BigQuery through the Cloud Console, the UI gives you a surprisingly long list of source options. You can upload a local file, or pull from Cloud Storage, Google Drive, Bigtable, Azure Blob Storage, or Amazon S3. The supported file formats are CSV, JSONL (newline-delimited JSON), Avro, Parquet, and ORC.

During the import you can adjust the schema and table settings directly in the UI, including column data types, partitioning, and clustering. That makes the Web UI a reasonable choice for one-off loads or for ad-hoc exploration when you do not want to script anything. On the exam, if a question describes an analyst clicking through to load a single file and tweak column types, the Web UI is the answer they are looking for.

The bq command-line tool: repetitive and large loads

The bq command-line tool is what you reach for when the Web UI starts to feel slow. It is faster for repetitive tasks and larger data operations because you can script it, automate it, and chain it into other shell work. A typical load looks like this:

bq load --source_format=CSV dataset_id.table_id ./local_file.csv

You can swap the source format flag for AVRO, PARQUET, ORC, or NEWLINE_DELIMITED_JSON, and you can point at a local file or a Cloud Storage URI. If a Professional Data Engineer exam question describes an engineer who needs to upload local files on a schedule from their own machine, or run the same load logic across many files, the bq tool is usually the right call.

BigQuery Data Transfer Service: managed and scheduled

The BigQuery Data Transfer Service is a managed service that automates data integration into BigQuery. The key word there is managed. You configure a source, a destination dataset, and a schedule, and Google handles the recurring transfer. The sources break down into a few categories:

Google services: Google Ads, Display and Video 360, and YouTube Channel data.
SaaS applications: Salesforce, ServiceNow, and Google Play.
External databases via JDBC: Oracle, Amazon Redshift, and Teradata.
Cloud storage services: Google Cloud Storage, Amazon S3, and Azure Blob Storage.

The Data Transfer Service also supports scheduled transfers, which is the differentiator on the exam. If a question mentions recurring imports from one of the supported sources, or uses words like scheduled, managed, or automated, the Data Transfer Service is almost always the intended answer over writing custom load jobs.

Streaming API vs Storage Write API

This is the comparison the exam loves. You may have to choose between the BigQuery Streaming API and the BigQuery Storage Write API, and the decision turns on delivery guarantees, throughput, and latency.

Streaming API: at-least-once delivery, which means duplicates are possible and you have to handle deduplication yourself. Lower throughput, lower latency. Good for real-time or near-real-time ingestion where immediate availability matters more than perfect accuracy.
Storage Write API: exactly-once delivery, so no duplicates. Higher throughput, higher latency. Good for scenarios where accuracy and consistency are critical, especially with high throughput, such as financial logs.

The shortcut on the exam: if the question stresses exactly-once, no duplicates, financial, or high throughput with consistency, pick the Storage Write API. If it stresses lowest latency, real-time, or accepts that duplicates can be handled downstream, pick the Streaming API.

Apache Hive data

Apache Hive data lives in Parquet, Avro, or ORC, and all three load cleanly into BigQuery with no transformation step required. You have two patterns to choose from. You can load the Hive files into native BigQuery tables to optimize storage and processing, taking advantage of BigQuery's serverless pricing instead of running a Hadoop cluster. Or you can leave the files in Cloud Storage and query them from BigQuery as an external table. The external-table path is the right answer when the question emphasizes not moving the data or keeping the original storage in place.

Teradata imports

Teradata gets its own slide-worth of exam attention because the tooling is specific. The two terms to recognize:

JDBC: an API that allows Java applications to connect and interact with databases using SQL. It is the standardized, universal way to talk to Teradata.
FastExport: Teradata's high-speed tool for exporting large data volumes efficiently.

You can use JDBC and FastExport with or without the Data Transfer Service. The Data Transfer Service handles the scheduling and orchestration side, while JDBC and FastExport handle the actual extraction. If a question mentions Teradata at all, JDBC and FastExport are the keywords the exam wants you to recognize.

Ingestion timestamps

One last detail worth keeping in mind. When you are running incremental imports or ongoing update jobs, adding an ingestion timestamp to each record lets you maintain a historical view of the data. You can see exactly when each row was loaded, which makes time-based queries and historical analysis much cleaner. It is not something you would bother with for a one-off initial load, but for recurring updates it simplifies querying for both current and past data.

The Professional Data Engineer exam will not ask you to write a load job from scratch. It will ask you to pick the right tool given a source, a cadence, and a consistency requirement. If you can map each of these six paths to the cue words that point to it, the BigQuery ingestion questions become some of the more predictable points on the test.

My Professional Data Engineer course covers BigQuery ingestion paths and the rest of the data engineering surface area you need for the exam.

Importing Data Into BigQuery for the PDE Exam: UI, bq, Transfer Service, Streaming