
When I sit down with someone who is starting their Professional Data Engineer prep, one of the first things I want them to be comfortable with is the on-premises versus cloud distinction. It sounds basic, and in a sense it is, but the exam leans on this framing constantly. Questions about cost models, scaling decisions, migration scenarios, and even some service selection problems all trace back to whether you actually understand what changes when a workload moves from a company's own data center to a provider like Google Cloud.
This article walks through that on-prem versus cloud contrast the way I teach it in my course, with the historical context that makes the differences click.
From roughly the 1960s through the early 2000s, if a company wanted to run software or store data, it bought the hardware itself. That meant servers, storage arrays, networking equipment, and a physical room to put them in. The room itself was not trivial. A real data center needs climate control, power redundancy, physical security, fire suppression, and a team of people to keep it running.
The defining trait of this setup is full control. The company owns the machines, the company decides what runs on them, and the company is responsible for every layer underneath the application. That control is real, and for some workloads it still matters. But the cost model is harsh. Hardware is a capital expenditure. You sign a check up front, and that money is committed whether the servers run hot or sit idle.
Scaling is the next problem. If demand grows, you do not just click a button. You forecast, you order, you wait for delivery, you rack, you cable, you configure, and you hope your forecast was right. If demand grows faster than expected, you fall behind. If it grows slower, you have paid for capacity you do not use.
Maintenance is the third problem. Dedicated IT teams handle patches, firmware upgrades, failed drives, network reconfigurations, and capacity planning. Those teams are expensive, and their work is mostly invisible until something breaks.
The fourth problem is underutilization. Because companies have to size for peak load, infrastructure spends most of its life running below capacity. A server provisioned for the busiest hour of the busiest day is wasted for the other 8,759 hours of the year.
Cloud computing flips the ownership model. Instead of the company's IT infrastructure sitting in the company's own facilities, it sits in a provider's facilities and gets accessed over the internet. Google Cloud, AWS, and Azure are the three big providers, and the customer pays for services rather than buying hardware.
The shift sounds small when you describe it in one sentence, but the downstream effects are large:
The provider is doing the same work the on-prem IT team used to do, just at a scale that makes it economical, and the customer rents the capability instead of owning it.
The Professional Data Engineer exam does not ask you to recite the history of computing. What it does ask, repeatedly, is to make decisions that depend on understanding the cost and scaling tradeoffs above. A few patterns to watch for:
When I am reviewing this material, I try to keep the contrast concrete. On-prem means the company owns the hardware and absorbs every cost and constraint that comes with ownership. Cloud means the company rents the capability and trades some control for elasticity, speed, and a different cost model. Almost every other concept on the Professional Data Engineer exam, from BigQuery's serverless model to Dataflow's autoscaling to Cloud Storage's pricing tiers, sits on top of that contrast.
Once that foundation is solid, the rest of the curriculum lines up. You stop seeing each service as a standalone product and start seeing them as different answers to the same underlying question, which is what changes when the data center stops being yours.
My Professional Data Engineer course covers the on-prem to cloud transition in depth, then builds on it through the full set of GCP data services you need for the exam.