If you are using Gemini 2.5 models on Vertex AI, Google recently sent out emails confirming that Gemini 2.5 Pro, Gemini 2.5 Flash, and Gemini 2.5 Flash Lite will be discontinued no earlier than October 16, 2026. The original retirement date was June 2026, so Google has extended the timeline to give teams more room to migrate.
That said, the October date is not final either. Google has stated that a confirmed discontinuation date will be set once Gemini 3 is Generally Available, and they will provide at least six months of notice once that date is locked in.
This post breaks down what is changing, what the migration involves, and a few things worth paying attention to as you plan the transition.
Three models are affected: Gemini 2.5 Pro, Gemini 2.5 Flash, and Gemini 2.5 Flash Lite, all on Vertex AI specifically. This does not apply to Gemini on AI Studio, which has its own separate deprecation timeline.
If you are running production workloads on Vertex AI using any of these models, you will need to migrate to a Gemini 3 replacement before the retirement date.
The Gemini 3 family has been rolling out since November 2025. Here is a quick overview of what is currently available on Vertex AI.
Gemini 3.1 Pro is the most advanced reasoning model in the Gemini 3 series. It launched in February 2026 and replaced the original Gemini 3 Pro Preview, which was discontinued in March 2026. If you are currently using Gemini 2.5 Pro, this is the most likely migration target.
Gemini 3 Flash launched more recently and is designed for speed and cost efficiency while still delivering strong reasoning capabilities. If you are using Gemini 2.5 Flash today, this is the natural replacement.
Gemini 3.1 Flash Lite is the most cost-efficient model in the Gemini 3 family, optimized for high-volume, low-latency workloads. It launched in March 2026 and is the replacement path for Gemini 2.5 Flash Lite.
Google has said that Gemini 3 models are generally more token efficient and higher quality than their 2.5 predecessors, but they come with higher per-token prices. Whether your total cost goes up or down will depend on your specific use case and how many tokens your workflows consume.
This is the migration detail that is most likely to require code changes on your end.
Gemini 3 models introduce thought signatures, which are encrypted representations of the model's internal reasoning process. They preserve the model's reasoning state during multi-turn conversations, especially when using function calling. When a thinking model pauses to call an external tool, the thought signature acts as a save state that allows the model to pick up its chain of thought once you provide the function's result.
Here is the important part: Gemini 3 models enforce stricter validation on thought signatures than Gemini 2.5 did. You must capture the thought signatures from each model response and include them in your follow-up request exactly as they were received. If a required thought signature is missing, Gemini 3 models will return a 400 error. This is not a warning. It is a hard failure.
If you are using the official Google Gen AI SDK (Python, Node.js, Go, or Java) and using the standard chat history features or appending the full model response to the history, thought signatures are handled automatically. You do not need to do anything extra.
If you are interacting with the API directly or managing conversation history manually, you will need to explicitly capture and pass back the thought signatures yourself. Google has a Thought signatures guide in their documentation that walks through the implementation.
Google's email notes that costs will change with this model upgrade. Gemini 3 models have higher per-token prices, but because they are more token efficient, the total cost for a given task may not be higher. It depends on the use case.
One thing Google suggests, and this is worth considering, is that you may find a better cost balance by shifting model tiers during the migration. For example, migrating from Flash to Flash Lite, or from Pro to Flash, rather than going straight across to the same tier in Gemini 3. If your workloads do not require the full reasoning capabilities of Pro, dropping down a tier could keep your costs in check while still getting the quality improvements of Gemini 3.
Review the Vertex AI pricing documentation for Gemini 3 models before you commit to a migration path.
Test your workflows. Do not wait until the last month. Validate your existing workflows against the Gemini 3 model you plan to migrate to. Google recommends testing multiple migration paths rather than committing to one upfront, which is reasonable advice given that the model tiers have different performance and cost characteristics.
Implement thought signature handling. If you are managing API interactions manually (not through the Gen AI SDK's built-in chat features), this is the most important code change to get right. Test it thoroughly, because a missing thought signature will break your requests entirely on Gemini 3.
Review your pricing. Run some representative workloads through Gemini 3 models and compare your token usage and costs against what you are currently spending on Gemini 2.5. This will tell you whether you need to adjust your model tier.
Update Provisioned Throughput assignments. If you have purchased Provisioned Throughput, update your PT assignment to the new model endpoints before the retirement date. Google recommends submitting those requests a month in advance to allow lead time for approvals.
Switch to the Gen AI SDK if you have not already. Google has noted that Vertex AI SDK releases after June 2026 will not support Gemini, and new Gemini features are only available in the Gen AI SDK. If your code still uses the older Vertex AI SDK, this migration is worth doing sooner rather than later.
One more time for clarity: this retirement applies to Vertex AI only. If you are using Gemini models through AI Studio, separate retirement dates apply, and you should check the Gemini deprecations documentation for those timelines.
October 16, 2026 is the earliest these models will be retired, but the actual date will depend on when Gemini 3 reaches GA. Google will give at least six months of notice once that is confirmed. The migration is not just a model endpoint swap. Thought signatures, pricing changes, and SDK updates all need attention. The earlier you start testing, the less pressure you will be under when the date is finalized.