The Gemini API additionally now offers extra granular management over multimodal imaginative and prescient processing, with a media_resolution parameter for configuring what number of tokens are used for picture, video, and doc inputs. Builders can steadiness visible constancy with token utilization. Decision could be set utilizing media_resolution_low, media_resolution_medium, or media_resolution_high. Increased decision boosts the mannequin’s skill to learn tremendous textual content or determine small particulars, Google mentioned.
Beginning with Gemini 3, Gemini API additionally brings again thought signatures to enhance perform calling and picture technology. Thought signatures are encrypted representations of the mannequin’s inside thought course of. By passing these signatures again to the mannequin in subsequent API calls, builders can be certain that Gemini 3 maintains its chain of reasoning throughout a dialog. That is necessary for complicated, multi-step agentic workflows the place preserving the “why” behind a choice is as necessary as the choice itself, Google mentioned.
Moreover, builders now can mix structured outputs with Gemini-hosted instruments, particularly Grounding with Google Search and URL Context. Combining structured outputs is very highly effective for constructing brokers that should fetch reside info from the net or particular internet pages and extract the information right into a JSON format for downstream duties, Google mentioned, noting that it has up to date the pricing of Grounding with Google Search to higher assist agentic workflows. The pricing mannequin adjustments from a flat charge of $35 per 1K prompts to a usage-based charge of $14 per 1,000 search queries.
