Gemini 3.0 Pro Preview

1 min read

note

We have observed that Gemini 3.0 Pro Preview provisioned by VertexAI can exhibit significantly long response times. This behavior is caused by load conditions on Google’s Vertex AI infrastructure, which can not be influenced by Unique.

How to Use Gemini 3 Pro Preview with UniqueAI via LiteLLM

Gemini 3 Pro Preview supports a large 1M-token context window, but it also comes with important constraints:

  • Quota limit is close to the full context size

  • Full-window LLM calls are slow

  • Large-token executions become expensive very quickly

To ensure fast, stable, and cost-efficient usage with UniqueAI, we recommend the following configuration.


1. Limit Returned Chunks per Tool Call (max 100 chunks)

To avoid hitting token limits, long runtimes, and high costs, internal knowledge search tool calls should be restricted to return max 100 chunks.

Where to configure it

In the Internal Knowledge Search configuration, set:

limit = 100
image-20251204-095044.png

2. Set Gemini “Thinking Level” to Low (for better speed)

Gemini 3 Pro Preview supports two reasoning modes:

  • high (slower, higher cost)

  • low (faster, lower cost)

For most flows in UniqueAI, low provides more than enough reasoning quality while dramatically improving speed and reducing token usage.

Where to configure it

In the Loop Agent configuration under Experimental → Additional LLM Options, add the reasoning_effort as low.

Config path: Advanced SettingsLoop AgentExperimentalAdditional Llm Options

image-20251204-075524.png

Setting

Location

Recommended Value

Max return chunks per tool call

Internal Knowledge Search → limit

100

Reasoning level

Loop Agent → Experimental → Additional Llm Options

reasoning_effort: “low”

Last updated