Retrieval-Augmented Generation (RAG) for the Generative AI Leader Exam

GCP Study Hub
Ben Makansi
September 16, 2025

When I sit down with the Retrieval-Augmented Generation material for the Generative AI Leader exam, I treat it as a single pipeline with a fixed order. Grounding tells you why the model needs an external source. RAG tells you how that source actually gets stitched into the prompt. The exam questions usually probe whether you know the stages and whether you know that retrieval happens before generation.

Retrieval-Augmented Generation is a technique where the model retrieves relevant information from your own documents or databases before generating a response, so the output is anchored in your specific, proprietary data instead of training memory alone. That definition is the one I would memorize verbatim for the Generative AI Leader exam, because question stems often paraphrase it and ask you to pick the matching technique out of a list that includes fine-tuning, prompt engineering, and grounding.

The six stages of RAG

The pipeline runs as a six-step cycle. Each step has a clear input, a clear output, and a clear reason it exists.

1. Ingestion and chunking

You start by gathering source documents. PDFs, API outputs, internal wikis, whatever your data looks like. Those documents are split into smaller pieces called chunks. The reason for chunking is to keep context manageable without handing the model an entire document at once.

2. Embedding

Each chunk is passed through an embedding model, which converts the text into a numerical vector. That vector is a list of numbers that captures the semantic meaning of the chunk. Similar content produces similar vectors, which is the property the next step relies on.

3. Indexing

The vectors get stored in a vector database. Pinecone and Chroma are common examples. A vector database is optimized for one job, which is searching for similarity between vectors quickly and efficiently.

4. Retrieval

When a user submits a question, that question is also converted into a vector. The system then searches the vector database for the chunks whose vectors are most similar to the query vector. Those are the most semantically relevant pieces of content.

5. Augmentation

The retrieved chunks are combined with the user's original prompt to create an enriched, context-heavy input for the model. Instead of receiving a bare question, the model now receives the question plus the supporting information that the retrieval step pulled back.

6. Generation

The large language model, such as Gemini, reads the augmented prompt and produces a grounded response based on the retrieved data rather than training memory alone.

Why the order matters on the exam

The single most important insight on RAG is that retrieval happens before generation. That ordering is what makes the response grounded and trustworthy. If a question describes a flow where the model generates first and then checks sources after the fact, that is not RAG. RAG enriches the prompt before the model writes a single token.

The other detail worth holding onto is the relationship between RAG and grounding. Grounding is the broader practice of connecting model output to verifiable external sources. RAG is one concrete implementation of that practice, built around vector search over your own data. On the Generative AI Leader exam, expect questions that ask you to map a business scenario, such as a support agent answering from internal documentation, to the technique that fits, and RAG is the answer when the scenario involves proprietary documents that the base model has never seen.

My Generative AI Leader course walks through RAG, grounding, and vector databases alongside the rest of the foundational material you need for the exam.

arrow