Gemini 3.0 Pro Preview

1 min read

We have observed that Gemini 3.0 Pro Preview provisioned by VertexAI can exhibit significantly long response times. This behavior is caused by load conditions on Google’s Vertex AI infrastructure, which can not be influenced by Unique.

How to Use Gemini 3 Pro Preview with UniqueAI via LiteLLM

Gemini 3 Pro Preview supports a large 1M-token context window, but it also comes with important constraints:

Quota limit is close to the full context size
Full-window LLM calls are slow
Large-token executions become expensive very quickly

To ensure fast, stable, and cost-efficient usage with UniqueAI, we recommend the following configuration.

1. Limit Returned Chunks per Tool Call (max 100 chunks)

To avoid hitting token limits, long runtimes, and high costs, internal knowledge search tool calls should be restricted to return max 100 chunks.

Where to configure it

In the Internal Knowledge Search configuration, set:

limit = 100

2. Set Gemini “Thinking Level” to Low (for better speed)

Gemini 3 Pro Preview supports two reasoning modes:

high (slower, higher cost)
low (faster, lower cost)

For most flows in UniqueAI, low provides more than enough reasoning quality while dramatically improving speed and reducing token usage.

Where to configure it

In the Loop Agent configuration under Experimental → Additional LLM Options, add the reasoning_effort as low.

Config path: Advanced Settings → Loop Agent → Experimental → Additional Llm Options

Recommended Configuration Summary

Setting	Location	Recommended Value
Max return chunks per tool call	Internal Knowledge Search → limit	100
Reasoning level	Loop Agent → Experimental → Additional Llm Options	reasoning_effort: “low”