Structured, Unstructured, and Semi-Structured Data for the PCA Exam

GCP Study Hub
Ben Makansi
April 26, 2026

The Professional Cloud Architect exam expects you to recognize three categories of data based on how it is organized. Pick the wrong category and you pick the wrong storage service, which is exactly the kind of mistake the exam is designed to catch.

Structured data

Structured data is highly organized and adheres to a predefined schema. Each row represents a record, each column represents an attribute, and the format is consistent across the entire dataset. Financial transactions, inventory records, and employee information all fit this shape because every entry has the same fields in the same order.

This organization is what makes structured data fast to query and easy to analyze. The trade-off is rigidity. If your data does not fit the schema, it does not go in the table.

Unstructured data

Unstructured data is free-form. It lacks a predefined schema and shows up in a wide range of formats. Text-based examples include emails, social media posts, and chat logs. Image-based examples include smartphone photos, MRI scans, and satellite imagery. Video-based examples include security footage, recorded lectures, and game streams.

Because there is no schema to query against, you need specialized tools to extract meaning from unstructured data. Natural language processing handles the text. Image recognition models handle the visuals. The raw bytes themselves are just files.

Semi-structured data

Semi-structured data sits between the two. It has no fixed schema, but it carries tags, attributes, or metadata that describe its contents and make it easier to process. JSON is the canonical example, with key-value pairs that can nest arbitrarily. YAML and XML files use tags the same way. An email is semi-structured because it mixes free text with headers like to, from, and subject. NoSQL databases like MongoDB store data in flexible document formats that follow this same pattern.

The point of semi-structured data is that you get some organization without committing to the rigidity of a relational schema.

The GCP services that map to each category

This mapping is what the Professional Cloud Architect exam wants you to know cold.

For structured data, the three services are BigQuery, Cloud SQL, and Spanner. BigQuery is the analytical warehouse for large datasets. Cloud SQL is the managed relational database for traditional workloads that need MySQL, PostgreSQL, or SQL Server compatibility. Spanner is the relational database that scales globally with strong consistency.

For unstructured data, the answer is Cloud Storage. It holds images, videos, and large documents as objects, no schema involved.

For semi-structured data, you have Bigtable, Firestore, and Memorystore. Bigtable handles high-throughput workloads like time-series data. Firestore is a NoSQL document database that stores nested key-value structures. Memorystore is the key-value store used for caching and other low-latency lookups.

Why this matters on the exam

Most storage questions on the Professional Cloud Architect exam start by describing the shape of the data. Once you classify it as structured, unstructured, or semi-structured, the candidate services collapse to a short list. From there you pick based on access pattern, scale, and consistency requirements.

Get the category wrong and every service you consider afterward is wrong too.

My Professional Cloud Architect course covers data type classification and storage service selection alongside the rest of the foundational architecture material.

arrow