
One of the cleanest, most predictable concepts on the Generative AI Leader exam is the machine learning development lifecycle. Google explicitly flags it as fair game on the test, and the question pattern almost always boils down to ordering. You see a scenario, and you have to identify which stage comes next, or which stage was skipped, or which stage produced the artifact being described. If you internalize the six stages and the two sequencing rules I cover below, this category becomes free points.
I am Ben Makansi, and in this post I want to walk through the lifecycle the way it is framed for the Generative AI Leader exam specifically. There are longer, more academic versions of this lifecycle floating around, but the exam uses a six-stage model. Memorize these six.
Here is the full sequence:
Order matters. The exam is going to test whether you know what comes before what, so let me go through each stage and explain the role it plays.
Data Ingestion is the collection phase. You gather data from various sources, which can include databases, files, or APIs. The objective is to pull together relevant, quality data that will eventually be used to train the model. Nothing in this stage involves cleaning or transforming the data. It is purely about getting the raw material in one place.
If a Generative AI Leader exam scenario describes a team pulling sales records out of BigQuery and dropping them into a Cloud Storage bucket so they can be used downstream, that is Data Ingestion.
Raw data is rarely usable as-is. Data Preparation is the stage where you clean and transform what you ingested. This includes work like removing duplicates, normalizing values, and shaping the data into a form a model can consume. The quality of this work directly determines model performance, which is why the exam treats it as a distinct stage rather than lumping it in with ingestion.
One of the two sequencing rules Google calls out is that Data Preparation comes after Data Ingestion. You have to collect the data before you can clean it. That sounds obvious in plain English, but exam questions will scramble the wording and try to get you to pick an answer where preparation happens first.
Once your data is processed, you divide it into two parts. One part is used to train the model, and the other part is held back to test how the model performs on data it has never seen. The typical split is around 70 to 80 percent for training and 20 to 30 percent for testing.
The second sequencing rule Google emphasizes is that the Train/Test Split comes after Data Preparation. You do not split raw data, you split processed data. If a scenario describes a team splitting their dataset before they have cleaned it, that is the wrong order, and the Generative AI Leader exam will give you an answer choice that flags exactly that mistake.
This is where the actual learning happens. You feed the training portion of the dataset into the machine learning algorithm and let the model pick up patterns. During this stage, validation is also part of the work. You tune parameters and refine the model so its accuracy improves before you move on to a final, held-out evaluation.
Validation is not the same as the final evaluation. Validation is iterative and happens inside the training loop. It is how you decide which version of the model to keep.
Using the test data you set aside back in stage three, you now assess the model's performance on data it has never seen. Metrics depend on the use case. Accuracy, precision, and recall are common, but the right metric depends on what the model is being asked to do. The point of this stage is to decide whether the model is ready for production.
If the model fails evaluation, you go backward in the lifecycle. You might revisit Data Preparation, you might re-do the Train/Test Split with a different ratio, or you might retrain with different parameters. The lifecycle is not strictly linear in practice, but for exam purposes you should treat the forward order as the canonical sequence.
Once the model passes evaluation, it gets deployed into a production environment where it can make predictions on new data. Deployment alone is not the end. Monitoring is bundled into the same stage because a deployed model needs continuous oversight. Data drifts. User behavior changes. Inputs the model has never seen show up in production. Without monitoring, model performance degrades silently.
On the Generative AI Leader exam, watch for scenarios where a team deploys a model and then walks away. The implied gap is monitoring, and the correct answer almost always involves adding observability.
If you remember nothing else from this post, remember these two ordering rules from the Generative AI Leader curriculum:
Those two rules are the most likely targets for ordering questions, and getting them right disambiguates a lot of trickier wrong answers.
The ML development lifecycle sits in the foundational portion of the Generative AI Leader exam, which means questions on it tend to be straightforward but unforgiving. Either you know the order or you do not. There is not much room for partial credit through reasoning. So treat this as a memorization exercise, lock in the six stages, and move on.
My Generative AI Leader course covers the full ML development lifecycle alongside the rest of the foundational material you need for the exam.