
When a Professional Cloud Architect exam scenario asks you to introduce machine learning into an architecture, the question is almost never about which algorithm to pick. It is about which Vertex AI tier to reach for. Google Cloud organizes its ML offerings along a spectrum of control versus convenience, and the exam expects you to read a scenario, identify where it lands on that spectrum, and recommend the right service. Get this mapping wrong and you either over-engineer a problem that a pretrained API would have solved in an afternoon, or you under-engineer one that needed real custom training infrastructure.
I'm Ben Makansi, founder of GCP Study Hub, and in this article I want to break down the three Vertex AI tiers, what each one actually looks like in practice, and the cues in a Professional Cloud Architect exam scenario that tell you which tier the question is testing.
Vertex AI exposes three ways to put a model into production, and they correspond to three different levels of work an architect signs up for.
Pretrained APIs are models Google has already trained on massive datasets and exposed as REST endpoints. You send an image, you get back labels. You send audio, you get back transcribed text. You send a chunk of text, you get back entities, sentiment, or a translation. There is no training step on your end. There is no model to deploy. You are calling a service.
AutoML sits in the middle. You bring your own dataset, but you do not write training code. You upload tabular data, images, or text into a managed dataset, point at the column you want to predict, and Vertex AI handles the rest. It does the feature engineering, searches across model architectures, tunes hyperparameters, evaluates the resulting models, and deploys the best one to an endpoint with a single click. The output is a model trained on your data, but you never wrote the training loop.
Custom Training is the full-control tier. You write the training script. You pick the framework, whether PyTorch, TensorFlow, scikit-learn, or XGBoost. You either use a Google-provided pre-built container or build your own and push it to Artifact Registry. You configure the machine type, boot disk, and accelerators like GPUs or TPUs. You decide whether to run on a single node or distribute across multiple VMs. Vertex AI handles the orchestration and scaling, but every architectural decision about the model is yours.
The Custom Training workflow has a specific shape that the Professional Cloud Architect exam expects you to recognize, even if you never write the code yourself.
The starting point is data, which can come from Cloud Storage, BigQuery, or a managed dataset inside Vertex AI. From there, you provide a training script. That script either runs inside a pre-built container with a framework already installed, or inside a custom container image that you have built and pushed to Artifact Registry. The custom container is what you reach for when you need a specific library version, a non-standard framework, or any custom dependency the pre-built containers do not cover.
Once the training image exists, you specify the compute environment. This is where you set the machine type, the boot disk size, and whether you want accelerators. For deep learning workloads, this is where GPUs or TPUs get attached. Then you choose the execution mode, which is either single-node training on one VM, or distributed training across multiple VMs for jobs that are too large for a single machine.
When the job finishes, the trained model can be pushed into the Vertex AI Model Registry, which tracks model versions and metadata. From the registry, the model is deployed to an endpoint, which is the REST API that downstream applications call for online predictions.
The architectural takeaway is that GCP handles infrastructure orchestration, scaling, and the model registry, while the team retains full control over the training logic and the hardware profile.
AutoML is the no-code path through the same end state, which is a model deployed to an endpoint.
You start with a managed dataset, the same way you would with custom training. The data can be tabular, image, or text depending on the use case. You specify the prediction target column, and from there the AutoML pipeline takes over.
Inside the pipeline, feature engineering runs first. Things like normalization and encoding happen automatically. Next is model search, where AutoML tries multiple algorithms to identify strong contenders. Hyperparameter tuning runs on those contenders. Then training runs on managed compute resources you do not provision yourself. Finally, the system evaluates the trained models and selects the one that performs best on the validation set.
The selected model lands in the Vertex AI Model Registry just like a custom-trained model would, and from there it is deployed to an endpoint with a one-click action.
The trade-offs are predictable. AutoML tends to be more expensive than equivalent custom training because you are paying for the search and tuning loops on top of the actual training. You have less control over feature configuration and model internals. And your input data has to follow specific format requirements that AutoML imposes.
Pretrained APIs are the simplest path of all. There is no dataset, no training, no endpoint to manage. You make a REST call to a Google-hosted service and you get a structured response back.
For vision tasks, the Cloud Vision API handles label detection, OCR, face detection, and similar workloads on images. For audio, the Speech-to-Text API transcribes spoken language and the Text-to-Speech API generates synthetic speech. For text, the Natural Language API does entity extraction, sentiment analysis, and syntax parsing, and the Translation API handles language conversion.
The architectural integration looks like a single API call:
from google.cloud import vision
client = vision.ImageAnnotatorClient()
image = vision.Image(source=vision.ImageSource(image_uri="gs://my-bucket/photo.jpg"))
response = client.label_detection(image=image)
for label in response.label_annotations:
print(label.description, label.score)No training data, no model registry, no endpoint provisioning. The downside is that the model is generic. It works well for common tasks on general data, but it will not recognize the specific objects, terminology, or patterns that exist only in your domain.
The Professional Cloud Architect exam tests this decision through scenario language. The cues for each tier are consistent.
Reach for Pretrained APIs when the scenario describes a common task, such as image labeling, speech transcription, document OCR, or sentiment analysis, and the input data sounds general rather than domain-specific. The signal is that the team wants to ship quickly and the use case fits a category Google has already trained models for. If the scenario emphasizes domain-specific vocabulary, proprietary classes, or fine-grained internal control, pretrained APIs are not the right answer.
Reach for AutoML when the scenario describes a team that wants a custom model on their own data but does not have deep ML expertise, or wants to minimize preprocessing and model development time. The signal is phrases like quick time to value, business analysts building models, or no-code requirements. The trade-off you should mentally check off is that AutoML costs more and gives less internal control, and the scenario should tolerate both.
Reach for Custom Training when the scenario describes complete control over data, model internals, or hardware. The signals are an in-house ML team, novel problem types, custom model architectures, research-level work, or specific accelerator requirements like multi-GPU distributed training. If the scenario mentions hyperparameter strategies the team wants to control directly, or non-standard frameworks, that is also Custom Training territory.
A Professional Cloud Architect exam scenario describes a retailer that wants to add product image search to its mobile app. The catalog has 12 million unique items, each photographed against a white background, and customers should be able to upload a photo of an item they like and find visually similar products in the catalog.
The first instinct might be to call the Cloud Vision API. But Vision API labels are generic, things like shoe, dress, or chair. They will not return rankings against a 12-million-item proprietary catalog. The right answer is AutoML Vision or a custom-trained embedding model, depending on the team's ML maturity. If the question emphasizes a small team and fast time to market, AutoML is the better fit. If it emphasizes a research team building a novel similarity model, Custom Training is the answer.
The structure to apply on every question of this shape is the same. First, ask whether the task is generic enough for a pretrained API. If yes, that is the answer. If no, ask whether the team needs internal model control or has framework requirements. If yes, Custom Training. If no, AutoML.
The Vertex AI tiers exist because no single model offering serves every workload. Pretrained APIs solve the common case fast. AutoML covers the middle ground where teams need their own model but do not want the engineering load of training infrastructure. Custom Training covers the cases where control matters more than convenience. The Professional Cloud Architect exam wants you to read a scenario, identify which trade-offs matter, and recommend accordingly.
If you want a structured walkthrough of these three Vertex AI tiers alongside the rest of the ML and AI material on the Professional Cloud Architect exam, the GCP Study Hub Professional Cloud Architect course covers the full ML domain with hands-on examples and exam-style questions.