
BigQuery best practices are one of the highest-yield study areas for the Professional Data Engineer exam. The exam loves to put you in a scenario where a query is too slow, a bill is too high, or a pipeline is wasting money on streaming inserts, and then ask you to pick the option that fixes it without breaking anything else. If you internalize a handful of patterns across query design, ingestion, and storage, a surprising number of questions answer themselves.
Here is how I think about each layer when I am preparing candidates for the Professional Data Engineer.
BigQuery charges you for bytes scanned on on-demand queries, so the cheapest query is the one that touches the fewest columns and the fewest rows. That single sentence drives most of the query best practices you need for the exam.
On top of that, partitioning and clustering belong in your mental toolkit. Partitioning a table by ingestion time or by a date column lets WHERE clauses prune entire partitions, which is the single biggest cost lever in BigQuery. Clustering on commonly filtered columns sorts data within partitions so range and equality filters scan even less. When the exam describes a query that filters on a date and a customer ID against a giant table, the right answer is usually partition on the date, cluster on the customer ID.
The ingestion best practices come down to one principle the exam hammers on: streaming ingest is expensive.
Streaming inserts are the right call when you genuinely need data to be queryable within seconds of arrival. They are the wrong call for nightly ETL, hourly rollups, or anything where a few minutes of latency is acceptable. Batch loads into BigQuery are free. If a scenario describes a workload that ingests at a regular cadence and does not need second-level freshness, batch loading is the cost-optimized answer.
File format matters too. Loading Avro or Parquet from Cloud Storage is faster and cheaper than loading CSV or JSON, because the columnar formats can be parsed in parallel and BigQuery does not need to infer or coerce schema as aggressively. If the exam offers a choice between formats for a batch load, the columnar ones are usually correct.
One more ingestion-level decision is shape. Nested and repeated fields, which is to say denormalized tables, reduce the need for joins at query time. BigQuery is built for this. If you are tempted to normalize a schema across many small tables, ask whether the resulting joins will outweigh the storage savings. The Professional Data Engineer exam frequently rewards the denormalized answer.
BigQuery storage has two automatic tiers. Data that has not been modified for 90 days drops to long-term storage pricing, which is roughly half the cost of active storage. You do not have to do anything to get this discount, but you can lose it by touching a table unnecessarily. If a question describes archival data being rewritten on a schedule and asks why storage costs are not dropping, the answer is that the rewrites are resetting the 90-day clock.
If you can answer three questions for any BigQuery scenario the exam throws at you, which columns am I scanning, am I filtering before the scan, and is this data shaped to avoid joins, you will get the vast majority of BigQuery questions right.
My Professional Data Engineer course covers BigQuery best practices in depth alongside the rest of the storage, processing, and machine learning topics the exam tests, with worked examples for the query, ingest, and storage patterns described here.