Decoupling ML Models with Vertex AI Endpoints for the PCA Exam

GCP Study Hub
February 17, 2026

One of the architectural patterns that shows up reliably on the Professional Cloud Architect exam is the decoupling of a machine learning model from the application that consumes it. The exam tends to wrap this in a fraud detection scenario, but the pattern generalizes to any system where ML predictions are part of the runtime path. I want to walk through how I think about this pattern, why Vertex AI endpoints are the answer Google is looking for, and how to recognize the question when it appears.

The setup the exam uses

The framing tends to look something like this. You have an application written in custom Python code, backed by a PostgreSQL database that holds customer transaction data. There is an in-house fraud detection model already running. The business has decided it wants to modernize the architecture so the system can adopt new AI capabilities quickly and stay agile as the ML landscape evolves. The question asks what you should do.

The distractors all sound reasonable on the surface. One option will suggest replicating the data into BigQuery and using BigQuery ML. Another will suggest retraining the model with AutoML Tables and running scheduled batch predictions. A third will suggest moving the database to Cloud Spanner and developing custom models in Vertex AI Workbench. The right answer is to deploy the fraud detection model to a Vertex AI endpoint and call that endpoint from the Python application.

Why the other answers fail

The reason BigQuery ML is wrong is timing. Fraud detection has to evaluate each transaction as it happens, which means real-time, low-latency inference. BigQuery ML is built for batch predictions over large datasets, not for evaluating a single transaction in milliseconds. The analytics story is solid, but the serving story does not match the requirement.

AutoML Tables with batch predictions has the same problem and adds a second one. Batch predictions do not satisfy real-time fraud detection, and committing to AutoML Tables locks you into one framework. The whole point of the modernization goal is flexibility to adopt new model architectures as they emerge, and that goal is undermined by binding yourself to a single AutoML approach.

Cloud Spanner with custom models in Vertex AI Workbench is wrong because it answers the wrong question. Cloud Spanner addresses database scalability, which is not the bottleneck the scenario describes. And while Workbench is a fine place to develop models, it does not give you a managed serving layer. You would still need to build your own infrastructure to expose the model as an API, which is exactly what Vertex AI endpoints are designed to provide.

The tightly coupled architecture and what is wrong with it

Before I get into the endpoint pattern, it helps to be precise about what is wrong with the starting state. In the tightly coupled version, the application is one bundle. The transaction processing logic, the business rules, the API endpoints, and the fraud detection model all live in the same codebase. The application talks directly to PostgreSQL for customer data, and the model is invoked the way one function calls another inside the same process.

The problems with this setup are not theoretical. If the data science group wants to push a new version of the model, the entire application has to be redeployed. If they want to experiment with a different framework or library, that experimentation pulls in code changes that ripple through the rest of the application. The ML group cannot move independently of the application release schedule, which means improvements queue up behind whatever else the application team is working on. And when a stronger model architecture appears in the broader ML world, getting it into production is gated by the application lifecycle rather than by the readiness of the model itself.

All of those problems trace back to the same root cause. The model and the application share a deployment unit, so they share a release cadence, a set of dependencies, and a blast radius for changes.

What the decoupled architecture looks like

The decoupled version keeps the application focused on what it is good at. The transaction logic, the business rules, and the API endpoints stay in the application. PostgreSQL continues to feed customer data into the application. None of that needs to change.

The change is that the fraud detection model moves out of the application codebase and into a Vertex AI endpoint. A Vertex AI endpoint is a managed service dedicated to serving a model. The model runs there, and the application calls it through a REST API. From the application's point of view, the model is no longer a function call inside the same process. It is a network call to a service that is owned by a different team and deployed on a different cadence.

What you actually get from the endpoint

Once the model is behind a Vertex AI endpoint, the data science group can push new versions of the model without touching the application. That alone is a significant unblocking. Adopting a new framework or a new model architecture becomes a deployment to the endpoint rather than a coordinated release with the application team.

The endpoint is also reusable. If a second application has a fraud detection use case, it can call the same endpoint instead of duplicating the model into a second codebase. Centralizing the serving layer prevents drift between applications and keeps a single source of truth for what the production model is doing.

Vertex AI handles the autoscaling of the serving infrastructure based on request volume, so traffic spikes do not require the application team to provision capacity for the model. And because Vertex AI is the platform where Google integrates new AI capabilities first, including Gemini, custom training, and the MLOps tooling for monitoring and versioning, the endpoint pattern positions the system to absorb future capabilities without an architectural rewrite.

How to spot the question on the exam

The signal that this pattern is being tested is a scenario that explicitly mentions agility, the ability to adopt new ML capabilities quickly, or independence between the application team and the ML group. When those phrases appear alongside a description of a model embedded in application code, the right answer is almost always to move the model behind a Vertex AI endpoint.

The trap to avoid is treating the question as a database modernization question or an analytics question. The Professional Cloud Architect exam will dangle BigQuery, Cloud Spanner, and AutoML in front of you, and each of those is the right answer to a different question. For this one, the requirement is real-time serving with a managed API contract that decouples the ML lifecycle from the application lifecycle, and Vertex AI endpoints are the only option that delivers all of that.

If you want a deeper walk through Vertex AI endpoints alongside the rest of the advanced architecture material that shows up on this certification, my full course is at https://gcpstudyhub.com/courses/professional-cloud-architect.

arrow