Gemini 3.0 Pro Preview
1 min read
We have observed that Gemini 3.0 Pro Preview provisioned by VertexAI can exhibit significantly long response times. This behavior is caused by load conditions on Google’s Vertex AI infrastructure, which can not be influenced by Unique.
How to Use Gemini 3 Pro Preview with UniqueAI via LiteLLM
Gemini 3 Pro Preview supports a large 1M-token context window, but it also comes with important constraints:
Quota limit is close to the full context size
Full-window LLM calls are slow
Large-token executions become expensive very quickly
To ensure fast, stable, and cost-efficient usage with UniqueAI, we recommend the following configuration.
1. Limit Returned Chunks per Tool Call (max 100 chunks)
To avoid hitting token limits, long runtimes, and high costs, internal knowledge search tool calls should be restricted to return max 100 chunks.
Where to configure it
In the Internal Knowledge Search configuration, set:
limit = 100
2. Set Gemini “Thinking Level” to Low (for better speed)
Gemini 3 Pro Preview supports two reasoning modes:
high (slower, higher cost)
low (faster, lower cost)
For most flows in UniqueAI, low provides more than enough reasoning quality while dramatically improving speed and reducing token usage.
Where to configure it
In the Loop Agent configuration under Experimental → Additional LLM Options, add the reasoning_effort as low.
Config path: Advanced Settings → Loop Agent → Experimental → Additional Llm Options

Recommended Configuration Summary
Setting | Location | Recommended Value |
|---|---|---|
Max return chunks per tool call | Internal Knowledge Search → limit | 100 |
Reasoning level | Loop Agent → Experimental → Additional Llm Options | reasoning_effort: “low” |