Overfitting and Regularization on the PDE Exam: L1 vs L2

619c7c8da6d7b95cf26f6f70

June 26, 2025

Machine learning shows up on the Professional Data Engineer exam in a narrower way than people expect. You are not asked to derive gradients or hand-tune optimizers. You are asked to recognize a small set of concepts that come up when you are building or maintaining ML pipelines on Google Cloud. Overfitting and regularization sit right at the top of that list, and within regularization you need to know the difference between L1 and L2. That is the entire scope of this article.

What overfitting actually means

Overfitting happens when a model becomes too complex and starts memorizing the training data instead of learning the underlying patterns. It picks up noise and quirks that are not generalizable, and the result is a model that scores well on the training set and falls apart on new data. That gap between training performance and validation or test performance is the symptom you are looking for.

It helps to picture three points on a spectrum. An underfit model is too simple. It misses real patterns in the data and performs poorly on both training and test sets. An overfit model is too complex. It nails the training set, including the noise, and generalizes badly. The model you want sits between those two. It captures the genuine signal without chasing every wobble in the training data.

On the exam, the cue for overfitting is usually a question that describes a model with excellent training accuracy and poor validation accuracy, or a model whose performance degrades sharply once it sees production traffic. When you see that pattern, the answer almost always involves either more data, simpler architecture, or regularization.

Regularization as a complexity penalty

Regularization is a technique that adds a penalty to the model's complexity to discourage overfitting. You are telling the optimizer that fitting the training data is not the only objective. Keeping the weights small or sparse is also part of the loss, so the model is pushed toward simpler solutions that tend to generalize better.

The goal is balance. You want a model simple enough to focus on the most relevant features, and expressive enough to capture the real patterns in the data. Regularization is the dial that lets you trade between those two pressures without throwing away features or rewriting the architecture.

Where this matters in a Google Cloud workflow is anywhere you are training models. If you are using BigQuery ML, you can pass L1_REG and L2_REG options on CREATE MODEL statements. If you are using Vertex AI custom training with scikit-learn, TensorFlow, or XGBoost, regularization is exposed as a hyperparameter on the estimator or the layer. Vertex AI Vizier and hyperparameter tuning jobs will often sweep these values for you.

L1 regularization (Lasso)

L1 regularization, also called Lasso, adds a penalty proportional to the absolute value of each weight. The important behavioral consequence is that L1 drives some weights all the way to zero. That is not a metaphor. Features whose weights hit zero have effectively been removed from the model.

This makes L1 useful in two distinct situations:

Feature selection. When you suspect many of your input features are irrelevant or redundant, L1 will prune them automatically by zeroing out their weights.
Sparse, interpretable models. If you need to explain which features the model is actually using, an L1-regularized model is easier to defend because most of the weights are zero and only the meaningful ones survive.

On the exam, the trigger phrases for L1 are things like "automatic feature selection", "sparse model", "reduce the number of features the model relies on", or "interpretability". If a question describes a wide table with hundreds of candidate columns and asks how to let the model decide which ones matter, L1 is the answer.

L2 regularization (Ridge)

L2 regularization, also called Ridge, adds a penalty proportional to the square of each weight. Unlike L1, L2 shrinks weights toward zero but does not eliminate them. Every feature stays in the model, but the magnitude of each weight is constrained.

This is the right choice when:

All features contribute. If you have already curated your feature set and you believe every input carries some signal, L2 keeps them all in play while still controlling complexity.
You are dealing with multicollinearity. When features are highly correlated with each other, the optimizer struggles because many different weight combinations produce similar predictions. L2 stabilizes that by spreading the weight across correlated features instead of arbitrarily picking one.

The exam cues for L2 are "correlated features", "multicollinearity", "shrink weights without removing features", and "all inputs are believed to be relevant". If a question describes correlated numeric inputs and asks how to make training more stable without changing the feature set, L2 is the pick.

How to keep L1 and L2 straight on test day

The shortest mnemonic I use is that L1 cuts, L2 compresses. L1 is a pair of scissors that snips features out by sending their weights to zero. L2 is a clamp that squeezes every weight down without removing any. Once that picture is locked in, the trigger words map cleanly:

Feature selection, sparsity, interpretability, hundreds of candidate features. That is L1.
Correlated features, multicollinearity, all features matter, stable training. That is L2.

And before either of those answers is right, the question has to actually be about overfitting. If the symptom is poor training accuracy, the model is underfitting and regularization will make it worse. Regularization only helps when training accuracy is high and validation accuracy is low. That sequence, diagnose first then prescribe, is what the Professional Data Engineer exam rewards.

My Professional Data Engineer course covers the full set of ML concepts that show up on the exam, including overfitting, regularization, and how these knobs are exposed in BigQuery ML and Vertex AI.

Overfitting and Regularization for the PDE Exam: L1 vs L2

What overfitting actually means

Regularization as a complexity penalty

L1 regularization (Lasso)

L2 regularization (Ridge)

How to keep L1 and L2 straight on test day

Get tips and updates from GCP Study Hub