Search Engine Configuration

8 min read

This page documents every setting that appears under Search Engine in the Spaces configuration UI. Each setting is shown by its UI label, with the underlying configuration field name in italics.

The list of available engines is decided at the platform level by the ACTIVE_SEARCH_ENGINES environment variable. Only engines on that list — and whose required credentials are provisioned — appear in the Search Engine selector. See the Activation Reference page under Platform / Infrastructure for what controls availability.

For a description of how each engine performs its searches at runtime (sequence diagrams, identity flows, what data leaves the platform), see the Search Engines page under Platform / Architecture. For deployment-side credential setup (API keys, identity bindings, env vars), see the Search Engine Setup page under Platform / Infrastructure.

Common setting

Every engine shares one setting:

UI Label	Field	Type	Default
Number of search results to fetch	`fetch_size`	integer	`5`

Higher values bring more URLs into the pipeline at the cost of more downstream work (crawling, content processing, relevancy sorting). Per-engine pagination is handled automatically.

Google Search

Display name in the UI: Google Search

What it does

Sends your search query to Google's Custom Search JSON API and returns a list of URLs, titles, and snippets. Google does not return page content directly, so the configured Web Page Reader runs after the search to fetch each page.

Settings

UI Label	Field	Type	Default
Number of search results to fetch	`fetch_size`	integer	`5`
Custom Search Config	`custom_search_config`	object	`{}`

Custom Search Config exposes optional Google Custom Search API parameters (language, country code, safe search level, etc.) for fine-tuning results. Leave empty for default behaviour.

When to use it

Default for most Spaces. Reliable, predictable, and well-known. Best when you want full control over the configured Custom Search Engine (CSE) — for example, restricting to a curated list of domains via the CSE configuration in Google Cloud.

Limitation: Google's Custom Search JSON API is being phased out by Google. New search engines can no longer be configured for open-web search after January 20, 2026, and existing open-web search engines will be disabled on January 1, 2027. Plan a migration to one of the alternatives below before then.

Grounding with Bing

Display name in the UI: Grounding with Bing

What it does

This is not a classic search API integration. Instead, the tool delegates the search to an Azure AI Foundry Agent with Bing grounding enabled. The agent autonomously searches Bing, reads each source, and produces a structured response with detailed answers and key facts per source. The Web Search tool then parses that response into standard search results.

Because the agent already returns content directly, no separate page crawl is needed by default.

Settings

UI Label	Field	Type	Default
Number of search results to fetch	`fetch_size`	integer	`5`
Requires Scraping	`requires_scraping`	boolean	`false`
Agent ID	`agent_id`	string	empty
Endpoint	`endpoint`	string	empty
Generation Instructions	`generation_instructions`	textarea	Built-in prompt
Language Model	`language_model`	language model	`gpt-4o`

Number of search results to fetch — Passed to the Bing grounding tool as the result count when the agent is auto-provisioned.
Requires Scraping — Normally false because the agent returns content directly. Set to true if you want the configured Web Page Reader to additionally fetch the result URLs (rare).
Agent ID — When empty (the default), the tool auto-provisions an Azure AI Foundry Agent named UNIQUE_GROUNDING_WITH_BING_AGENT and configures the model, tools, and instructions per run. When set, the tool uses that pre-configured agent as-is and ignores Generation Instructions and the platform-level model setting.
Endpoint — Azure AI project endpoint. Normally provisioned at the platform level via AZURE_AI_PROJECT_ENDPOINT; the platform value takes precedence when set.
Generation Instructions — Instructions given to the agent on how to search and structure results. Only applies in auto-provisioned mode (when Agent ID is empty). The output-format rule is always appended automatically.
Language Model — Fallback model used to parse the agent's response when it does not return valid JSON.

When to use it

When you want answers grounded in Bing's full search index without managing a search-API integration yourself, and you have an Azure AI Foundry project available. The result quality is good but latency is significantly higher than direct-API engines (typical 15-20s versus 4-6s).

Grounding with VertexAI

Display name in the UI: Grounding with VertexAI

What it does

Like Bing, this is not a classic search API integration. The tool calls a Gemini model on Google Cloud's Vertex AI with Google's grounding tool attached. Gemini autonomously runs web searches, reads sources, and returns a grounded text answer with citation metadata. The Web Search tool then parses that response into standard search results.

By default, redirect URLs returned by Vertex (vertexaisearch.cloud.google.com/grounding-api-redirect/...) are followed to their canonical destinations so citations point to the real site.

Because the agent returns content directly, no separate page crawl is needed by default.

Settings

UI Label	Field	Type	Default
Number of search results to fetch	`fetch_size`	integer	`5`
VertexAI Model Name	`vertexai_model_name`	string	`gemini-3-flash-preview`
Generation Instructions	`generation_instructions`	textarea	Built-in prompt
Fallback Language Model	`fallback_language_model`	language model	Toolkit default
Requires Scraping	`requires_scraping`	boolean	`false`
Enable Enterprise Search	`enable_entreprise_search`	boolean	`false`
Enable Redirect Resolution	`enable_redirect_resolution`	boolean	`true`

VertexAI Model Name — The Gemini model deployed in the Vertex project. Must support grounding.
Generation Instructions — Instructions given to Gemini on how to search and structure results. The output-format rule is always appended automatically.
Fallback Language Model — Used to parse Gemini's response when it does not return valid JSON.
Enable Enterprise Search — When true, uses Vertex Enterprise Search instead of public Google Search. Requires Enterprise Search to be enabled on the bound GCP project.
Enable Redirect Resolution — When true, follows redirects on each result URL so Vertex grounding-redirect URLs become canonical site URLs. Recommended on.

When to use it

When you want a Google-quality grounded answer and your platform deployment is already on Google Cloud (Workload Identity available). Latency profile is similar to Bing.

Brave Search

Display name in the UI: Brave Search

What it does

Sends your search query to Brave's Search API. Returns URLs, titles, and rich snippets (including extra snippet fragments per result). Snippets are usually descriptive enough that page crawling is not required.

Settings

UI Label	Field	Type	Default
Number of search results to fetch	`fetch_size`	integer	`5`
Requires Scraping	`requires_scraping`	boolean	`false`

When to use it

Fast, privacy-respecting alternative to Google. Good snippet quality means the configured Web Page Reader often does not need to fetch full pages. Pagination is capped at 20 results per request.

Jina Search

Display name in the UI: Jina Search

What it does

Sends your search query to Jina's Search API. Returns URLs, titles, descriptions, and full Markdown page content in one call. No separate Web Page Reader run is needed.

Settings

UI Label	Field	Type	Default
Number of search results to fetch	`fetch_size`	integer	`5`

Plus a number of optional locale, behavior, and rendering parameters (country, language, in-site filter, return format, browser-mode rendering, etc.) that can be set in the configuration object. Leave them empty unless you have a specific need.

When to use it

When you want a single API that does both search and content extraction, with Markdown output ready for the content pipeline. Good for sites that need browser-mode rendering.

Tavily Search

Display name in the UI: Tavily Search

What it does

Sends your search query to Tavily's Search API. Returns URLs, titles, snippets, and (in advanced depth) raw page content in one call. No separate Web Page Reader run is needed.

Settings

UI Label	Field	Type	Default
Number of search results to fetch	`fetch_size`	integer	`5`
Search Depth	`search_depth`	`basic` / `advanced`	`advanced`
Topic	`topic`	`general` / `news` / `finance` / unset	unset

Search Depth — advanced returns more thorough page content; basic is faster but lighter.
Topic — Optional topic-specific search optimisation. Leave unset for general web search.

When to use it

When latency matters and you want a single API for search + extraction. Tavily's advanced mode produces clean Markdown content ready for the pipeline.

Firecrawl Search

Display name in the UI: Firecrawl Search

What it does

Sends your search query to Firecrawl's Search API and returns results from configurable sources (web and/or news). Each result includes title, snippet, and full Markdown content. No separate Web Page Reader run is needed.

Settings

UI Label	Field	Type	Default
Number of search results to fetch	`fetch_size`	integer	`5`
Sources	`sources`	list of `web`, `news`	`[web, news]`

When to use it

When you want results that combine web and news sources, with clean extracted Markdown content per result.

Customized API Search

Display name in the UI: Customized API Search Engine

What it does

Sends your search query to a user-defined REST endpoint and expects a response that matches the standard Web Search results schema. Lets you plug in a custom search service or in-house search index without building a new engine integration.

For GET requests, the query is added as a query parameter. For POST requests, it is included in the JSON body as {"query": "..."}. The response must be a JSON object with a results array, each item containing url, title, snippet, and optionally content.

Settings

UI Label	Field	Type	Default
Number of search results to fetch	`fetch_size`	integer	`5`
API Endpoint	`api_endpoint`	string	`http://api.example.com`
API Request Method	`api_request_method`	`GET` / `POST`	`GET`
API Headers	`api_headers`	string (JSON)	`{"Content-Type": "application/json"}`
API Additional Query Params	`api_additional_query_params`	string (JSON)	`{}`
API Additional Body Params	`api_additional_body_params`	string (JSON)	`{}`
Search Engine Mode	`search_engine_mode`	`standard` / `agent`	`standard`
Requires Scraping	`requires_scraping`	boolean	`false`
Timeout	`timeout`	integer (seconds)	`120`

Search Engine Mode — Controls how the orchestrator AI is prompted to use this engine. Use standard for classic SERP-style endpoints and agent for agent-style endpoints that already do their own search-and-read.
Requires Scraping — Set to true if your endpoint returns only URLs and snippets so the configured Web Page Reader will fetch page content.
Timeout — Per-request timeout. Increase if your endpoint is slow (e.g. agent-style endpoints that do internal browsing).

Platform-side override: when the matching CUSTOM_WEB_SEARCH_API_* environment variable is set, the corresponding field is hidden from the Spaces UI and locked to the env-var value. This lets platform admins fix the endpoint configuration centrally while still allowing space admins to adjust other settings. See Search Engine Setup under Platform / Infrastructure.

When to use it

When the engine you want is not one of the built-ins, or when you have an internal search service (curated content index, document store, vendor API) you want to expose to AI assistants.

Does this engine return content directly?

Engine	Returns content directly	Notes
Google Search	No	The Web Page Reader runs after the search to fetch each page.
Grounding with Bing	Yes (default)	Configurable via Requires Scraping.
Grounding with VertexAI	Yes (default)	Configurable via Requires Scraping.
Brave Search	No (snippets only)	The Web Page Reader runs after the search.
Jina Search	Yes	Markdown content returned per result.
Tavily Search	Yes (in `advanced` depth)	Raw content returned per result.
Firecrawl Search	Yes	Markdown content returned per result.
Customized API Search	Depends on your endpoint	Set Requires Scraping to `true` if your endpoint returns URLs only.