
Cloud Storage questions on the Professional Data Engineer exam love to mix four features together in one scenario: lifecycle rules, object versioning, retention policies, and Autoclass. They all touch the same bucket, they sometimes conflict, and the exam will absolutely write a question where two of them are technically possible and only one is correct. I want to walk through each one the way I think about them when I'm reading a question, so you can pick the right tool quickly.
Lifecycle rules let you automate what happens to objects in a bucket based on conditions like age. The two actions you need to know cold are SetStorageClass and Delete. SetStorageClass moves an object to a cheaper class once a condition is met, and Delete removes it entirely. The condition is usually age in days, but it can also key off things like number of newer versions or the current storage class.
The reason lifecycle rules show up so often on the Professional Data Engineer exam is that they are the canonical answer for cost optimization on Cloud Storage. If a question says raw event logs land in Standard, get queried heavily for the first week, and then are rarely touched after 30 days, the answer is a lifecycle rule that transitions to Nearline at 30 days, Coldline at 90, and Archive at a year. If the question adds a compliance angle like data must be purged after seven years, you stack a Delete action on top of the transitions.
The key trait to remember: lifecycle rules are policy-driven and predictable. You declare the rules, and Cloud Storage applies them on a schedule. You are deciding the transitions, not reacting to access patterns.
Object versioning keeps a history of every change in the bucket. When versioning is enabled and you replace an object with the same name, the old object does not disappear. It becomes a noncurrent version in the same bucket. When you delete a versioned object, Cloud Storage writes a delete marker, and the previous version is preserved as noncurrent.
You enable it with a one-liner:
gsutil versioning set on gs://BUCKET_NAMEThe exam framing for versioning is almost always recovery. Accidental overwrites, accidental deletes, the ability to roll back a corrupted file. If the scenario emphasizes protecting against human or pipeline errors, versioning is the right call.
One thing that catches people: noncurrent versions still cost money. You almost always pair versioning with a lifecycle rule that deletes noncurrent versions after some number of days, otherwise the bucket grows forever. The exam will sometimes hand you a question where storage costs are climbing on a versioned bucket and expect you to spot the missing lifecycle rule for noncurrent versions.
A retention policy sets a minimum duration during which objects in a bucket cannot be deleted or replaced. It applies retroactively to every object already in the bucket and to every new object that arrives. The use case is compliance, financial records, legal holds, healthcare data, anything where regulation says you must keep the data for N years.
Bucket lock is the next level up. Locking the retention policy makes it permanent. Once locked, the policy cannot be removed and the retention duration cannot be shortened. You can extend it, but you can never pull it back. A locked retention policy also blocks bucket deletion until every object has aged past the retention period. This is what auditors want to see.
The critical exam trap here: retention policies and object versioning cannot be used at the same time on the same bucket. If a question describes a regulated workload that needs both "keep all historical versions" and "prevent deletion for seven years," pick the retention policy. Versioning is about recoverability, retention is about enforcement, and you have to pick one. If the scenario screams compliance, regulatory audit, or immutable records, the answer is retention policy with bucket lock.
Autoclass is the hands-off option. You enable it on a bucket, and Cloud Storage moves objects between Standard, Nearline, Coldline, and Archive based on actual access patterns. The default transitions are roughly 30 days of no access to Nearline, 90 days to Coldline, and 365 days to Archive. If an object gets accessed, it bumps back up to Standard.
Autoclass is great when access patterns are unpredictable or unknown. The catch, and this is what the Professional Data Engineer exam loves to test, is that Autoclass always starts objects in Standard. If you already know the data will be cold from day one, Autoclass leaves you paying Standard rates during the warm-up window, and a manual lifecycle rule that transitions straight to Nearline or Coldline will be cheaper.
The rule of thumb I use on the exam: known access pattern means lifecycle rule, unknown or shifting access pattern means Autoclass.
When a question puts these features in front of you, I run through a short mental checklist:
Most Cloud Storage scenario questions resolve cleanly once you map the scenario to one of those buckets. The features overlap less than they look like they do, and the exam is usually testing whether you noticed the constraint that rules out two of the four options.
My Professional Data Engineer course covers Cloud Storage lifecycle management, versioning, retention, and Autoclass in the depth you need for the exam, along with the rest of the storage and data services on the blueprint.