Spanner Vector Search: Storing and Querying Embeddings at Scale

GCP Study Hub
June 2, 2026

A vector embedding converts data such as text, images, or audio into a numerical array of floating-point values. Cloud Spanner can store those arrays and run similarity searches against them, which lets you find records by meaning rather than by exact text match. For the Professional Cloud Database Engineer exam, the useful framing is that this is the same set of mathematical principles you may have seen in Cloud SQL, applied inside a database built for global scale. The exam tends to test when vector search is the right approach and how it differs from keyword-based filtering, so it helps to be precise about both.

What an embedding represents

An embedding turns raw information into a format suited to mathematical analysis. By converting your data into a numerical array, Spanner can treat each item as a point in a multi-dimensional space. A short text example makes this concrete. The word fast is represented by a specific set of coordinates. Because the word quick has a similar meaning, its coordinates end up almost the same. Spanner uses those values to determine that the two words are related even though the characters do not match. This is what allows similarity search to work on intent and concept rather than on literal spelling.

Distance functions in Spanner

To compare two vectors, Spanner provides a set of vector distance functions designed for global scale. The three you should know are EUCLIDEAN_DISTANCE(), COSINE_DISTANCE(), and DOT_PRODUCT(). They each measure proximity between two vectors in a slightly different way, and you choose the one that fits your use case. EUCLIDEAN_DISTANCE() calculates the shortest distance between two vectors, which is the straight-line distance between two points in the coordinate system.

A basic query orders results by one of these functions. You select the column you want, order by the distance between the stored vector and the search vector generated from the user's query, and limit the result. The item with the smallest distance value is the closest match.

SELECT product_name
FROM products
ORDER BY EUCLIDEAN_DISTANCE(vector_product, vector_search)
LIMIT 1;

Here LIMIT 1 returns the single nearest item. The same shape extends to returning the top several matches by raising the limit.

Vector search versus LIKE and SEARCH()

Consider a user who types a query such as I need something for a rainy day hike to find gear in a catalog. If you apply a standard LIKE filter, the database scans for that exact sequence of characters. That is unreliable for natural language, because it requires the product description to match the user's phrasing word for word.

Spanner also offers a SEARCH() function for full-text indexing. It is more robust than LIKE because it looks for individual tokens and synonyms rather than one exact string. It works in some cases, but it is still fundamentally keyword matching. It relies on word overlap, so it can miss relevant results when the right product is described in different terms than the query uses.

Vector similarity search is where a distance function such as EUCLIDEAN_DISTANCE() provides a stronger solution. Instead of looking for matching words, Spanner calculates the mathematical proximity between the intent of the search and the characteristics of the products. By analyzing the vector space, it can identify that a waterproof jacket is the best match for a rainy hike, retrieving the correct item based on the underlying concept and utility rather than on shared words. On the exam, when a scenario describes natural-language intent that LIKE or keyword search would fail to satisfy, vector search is generally the answer.

Vector indexes and approximate nearest neighbor search

Ordering by a distance function over every row works, but it does not scale well on its own because it compares the search vector against all stored vectors. To run an approximate nearest neighbor search, often shortened to ANN, you create a specialized vector index that Spanner uses to accelerate the query. The index must use a specific distance metric, set through the distance_type parameter to one of COSINE, DOT_PRODUCT, or EUCLIDEAN, chosen to match the use case.

The example below creates a vector index using the euclidean distance type on an embedding column, with options for tree depth and the number of leaves that control how the index is structured.

CREATE VECTOR INDEX NickNameEmbeddingIndex
ON Account(nick_name_embeddings)
WHERE nick_name_embeddings IS NOT NULL
OPTIONS (distance_type = 'EUCLIDEAN', tree_depth = 2, num_leaves = 1000);

Once the index exists, you can run an ANN vector search against that property. In Spanner Graph this is expressed over graph nodes, returning the top approximate nearest neighbors for a given embedding. The query uses the approximate form of the distance function, APPROX_EUCLIDEAN_DISTANCE(), and a num_leaves_to_search option that controls how much of the index is examined. A graph hint such as FORCE_INDEX forces the query optimizer to use the specified vector index in the execution plan, which is how you make sure the ANN path is taken rather than a full comparison.

The distinction worth carrying into the exam is the difference between an exact ordering with a distance function and an approximate search backed by a vector index. The exact form compares against every vector and is simple to write. The approximate form trades a small amount of precision for speed at scale, and it depends on a vector index built with a matching distance_type. Knowing that the index pins the metric, and that the search function must agree with it, is the kind of detail these questions tend to turn on.

Our Professional Cloud Database Engineer course covers Spanner vector search alongside Spanner full-text search and the broader set of distance functions, with practice questions that drill these distinctions.

Get tips and updates from GCP Study Hub

arrow