Pre-trained AI APIs for the PCA Exam: NL, Vision, Translation, Speech

GCP Study Hub
Ben Makansi
March 28, 2026

Pre-trained APIs are the easiest AI questions on the Professional Cloud Architect exam. They are ready-to-use machine learning models exposed via API for common tasks, with no training required. The exam does not ask you to architect them. It asks you to match a scenario to the right one. If you can do that quickly, you bank time for the harder questions.

Here is the surface area I would commit to memory before sitting for the PCA.

The framing

The first thing to know is when a pre-trained API is the right answer at all. Pre-trained APIs are good for common tasks and general data. If a scenario calls for domain-specific customization or a custom model trained on proprietary data, you should be looking at AutoML or Vertex AI, not a pre-trained API. If the scenario describes a general task on general data, the pre-trained API is the cheaper and faster answer.

Cloud Natural Language API

Cloud Natural Language analyzes text. The three features to remember are sentiment analysis, entity extraction, and syntax parsing.

  • Sentiment analysis returns a score for how positive or negative a piece of text is. A sentence like John loves his new iPhone that he bought in New York last week comes back with a strongly positive score.
  • Entity extraction identifies the things mentioned in the text. In that same sentence, the API tags John as a PERSON, iPhone as a CONSUMER_GOOD, New York as a LOCATION, and last week as a DATE.
  • Syntax parsing returns each token along with its part of speech and its syntactic role, so you can tell that John is a proper noun acting as the subject and loves is the verb at the root of the sentence.

If a scenario asks for sentiment scoring of customer reviews, tagging entities in unstructured text, or breaking sentences into grammatical structure, Natural Language API is the answer.

Cloud Vision and Video Intelligence APIs

These two APIs offer similar features. Vision works on images, Video Intelligence works on videos frame by frame. The features to remember are:

  • Image and video labeling. Identifies general concepts and objects, like people, street, or mode of transport. High level, useful for categorization and metadata.
  • Object localization (object detection). Goes further than labeling by drawing bounding boxes around each object so you can count or track instances.
  • Text detection. OCR for both printed and handwritten text. Useful for scanning documents or extracting text from signage.
  • Location detection. Identifies famous landmarks and returns geographic coordinates. This one is Vision only, not Video Intelligence. That distinction is the kind of detail the exam likes to test.

Face detection is not facial recognition

Both Vision and Video Intelligence can also do face detection. They identify facial landmarks like eyes, nose, and mouth, and they can estimate emotions like joy, sorrow, anger, or surprise.

The trap on the exam is that face detection is not the same as facial recognition. These APIs detect that there is a face and where it is, not whose face it belongs to. If a scenario asks you to identify or verify a specific person's identity, the answer is not Cloud Vision.

Cloud Translation API

Cloud Translation has three features worth remembering:

  • Language translation across over 100 languages. Send text, get a translated version back.
  • Language detection for cases where the source language is not known ahead of time.
  • Document translation with format retention, which lets you translate full documents like PDFs or Word files while preserving the original structure and formatting.

The format-retention piece is the one to remember. If a scenario specifically calls out preserving document formatting during translation, that points at Cloud Translation rather than a generic text-translation flow.

Speech-to-Text and Text-to-Speech

Two APIs going in opposite directions:

  • Speech-to-Text takes audio and returns a transcription. Common scenarios include captions, voice commands, and call center logs.
  • Text-to-Speech takes text and synthesizes human-like audio with customizable voice, pitch, and speed. Common scenarios include virtual assistants and reading apps.

Two extra Speech-to-Text details that show up on exam questions:

  1. It supports both real-time streaming and batch transcription, so it covers live use cases and pre-recorded files.
  2. It supports speaker labeling, identifying which parts of a conversation came from which speaker as Speaker 1, Speaker 2, and so on.

If you see a scenario about transcribing a multi-party call and tagging who said what, speaker labeling in Speech-to-Text is the giveaway.

Recommendations AI

One more API that can show up on the Professional Cloud Architect exam. Recommendations AI is built for ecommerce sites that need personalized product recommendations. The two strategies to recognize:

  • Other Products You May Like. Focused on increasing click-through rate by surfacing visually or behaviorally similar items. Engagement, not directly revenue.
  • Frequently Bought Together. Focused on upselling and increasing cart size by surfacing complementary products. Conversion and revenue.

If a scenario emphasizes click-through rate on similar items, that is the first pattern. If it emphasizes upselling and growing cart size, that is the second.

How I would study these

Make a one-line cheat sheet for each API. Natural Language for text. Vision and Video Intelligence for images and videos, with location detection unique to Vision and face detection not equal to facial recognition. Translation for over 100 languages with format-retaining document translation. Speech-to-Text and Text-to-Speech for audio in both directions, with speaker labeling and real-time plus batch on the transcription side. Recommendations AI for ecommerce, with click-through versus cart-size as the split. If you can match a scenario to one of those one-liners in a few seconds, the questions in this category become very fast points.

My Professional Cloud Architect course covers the pre-trained AI APIs alongside the rest of the advanced architecture material, so you see them in context with AutoML, Vertex AI, and the rest of the data and AI services Google expects you to recognize on exam day.

arrow