Search Engine Configuration

6 min read

The search engine determines how web searches are performed. The available engines are controlled at the platform level via the ACTIVE_SEARCH_ENGINES environment variable (a JSON array of engine names). Only engines listed in that variable appear as options in the space configuration UI.

The search_engine_config field is a discriminated union -- its schema changes dynamically based on which engines are activated.

Common Settings

All search engines share the following setting:

Setting

Type

Default

Description

fetch_size

integer

5

Number of search results to retrieve per query.


Google Search Engine

Display name: Google Search Engine

Uses the Google Custom Search JSON API to retrieve search results (URLs, titles, snippets). Requires scraping -- a crawler must be used to fetch the actual page content.

Space-Level Config

Setting

Type

Default

Description

fetch_size

integer

5

Number of results to fetch. Pagination is handled automatically (Google API returns max 10 per page).

custom_search_config

object

{}

Optional Google Custom Search API parameters (e.g., language, country, safe search).

Platform Setup

Environment Variable

Required

Description

GOOGLE_SEARCH_API_KEY

Yes

API key for the Google Custom Search JSON API.

GOOGLE_SEARCH_ENGINE_ID

Yes

The Custom Search Engine ID (cx parameter).

GOOGLE_SEARCH_API_ENDPOINT

Yes

API endpoint URL (typically <https://customsearch.googleapis.com/customsearch/v1).>


Bing (Grounding with Bing)

Display name: Grounding with Bing

This is not a standard search engine integration. Instead of calling a search API directly, the tool delegates the entire search to an Azure AI Foundry Agent with a BingGroundingTool attached. The agent autonomously searches Bing, reads the sources, and produces a structured response containing detailed answers and key facts per source. A response parsing pipeline then converts the agent output into standard WebSearchResult objects that the rest of the Web Search tool can process.

How It Works

  1. The tool authenticates with Azure using the configured identity credentials (see Platform -- Azure Authentication).

  2. It connects to the Azure AI project via the AIProjectClient.

  3. It either discovers/creates an agent automatically or uses a pre-configured one (see Operating Modes below).

  4. A thread is created with the user's search query. The agent runs against it with Bing grounding and returns a text response.

  5. The response is parsed into structured results via a multi-strategy pipeline (see Response Parsing).

Operating Modes

The Bing engine has two operating modes, determined by whether agent_id is provided:

Auto-Provisioned Mode (default)

When agent_id is left empty (the default), the tool manages the agent lifecycle automatically:

  1. Lists existing agents in the Azure AI project and looks for one named UNIQUE_GROUNDING_WITH_BING_AGENT.

  2. If not found, creates a new agent with that name using the model specified by AZURE_AI_BING_AGENT_MODEL (required -- no default).

  3. Creates a thread and runs it with per-run overrides:

    • Model -- overridden to AZURE_AI_BING_AGENT_MODEL (platform env var).

    • Toolset -- overridden to a BingGroundingTool configured with AZURE_AI_BING_RESOURCE_CONNECTION_STRING and the space-level fetch_size.

    • Instructions -- overridden to generation_instructions (space config) + a JSON output format rule.

Because all behavior is controlled at execution time, the persisted agent in Azure is just a shell. Space admins control model, instructions, and fetch size from the Spaces UI without needing access to the Azure portal.

Pre-Configured Mode

When agent_id is set (via AZURE_AI_ASSISTANT_ID env var or space config), the tool uses that agent as-is:

  • No overrides are applied -- the agent's own model, tools, and instructions are used.

  • The tool only creates a thread with the user's query and processes the run.

  • The agent must already be configured in Azure with the correct Bing grounding tool, model, and instructions.

This mode is useful when platform engineers want full control over the agent configuration in Azure, or when the agent has been customized beyond what the space-level settings can express.

agent_id and endpoint Resolution

Setting

Source Priority

Behavior when empty

endpoint

  1. AZURE_AI_PROJECT_ENDPOINT env var, 2. Space config value

Error -- at least one must be set.

agent_id

  1. AZURE_AI_ASSISTANT_ID env var, 2. Space config value

Auto-provisioning (discover or create agent).

Both settings follow a env var takes precedence pattern. The endpoint env var lets the platform lock the project endpoint while still allowing space admins to set engine-specific overrides. The AZURE_AI_ASSISTANT_ID env var lets the platform specify a pre-configured agent -- when set, it overrides the space-level agent_id and forces pre-configured mode.

Space-Level Config

Setting

Type

Default

Applies To

Description

fetch_size

integer

5

Both modes

Number of search results to retrieve. In auto-provisioned mode, passed to the BingGroundingTool as count.

requires_scraping

boolean

false

Both modes

Whether to additionally crawl result URLs. Normally false because the agent returns content directly.

agent_id

string

""

Mode selector

The ID of a pre-configured Azure AI Foundry Agent. Leave empty for auto-provisioning.

endpoint

string

""

Both modes

Azure AI project endpoint. Overridden by AZURE_AI_PROJECT_ENDPOINT env var if set.

generation_instructions

string (textarea)

Built-in prompt

Auto-provisioned only

Instructions for the agent on how to search and format results. Ignored in pre-configured mode.

language_model

Language model identifier

gpt-4o

Both modes

Fallback LLM used to parse agent responses when the output is not valid JSON (see Response Parsing).

Generation Instructions

The built-in default generation_instructions instruct the agent to:

  • Search broadly with varied keywords to cover every angle of the query.

  • Read every source thoroughly -- extract every relevant fact, figure, statistic, date, name, and quote.

  • Produce one result entry per source with a detailed_answer and a list of key_facts.

  • Preserve detail -- prefer verbosity over brevity.

At run time, a RESPONSE_RULE suffix is appended that instructs the agent to respond with a JSON object matching the GroundingWithBingResults schema. Space admins can customize the main instructions while the output format rule is always enforced.

Response Parsing

The agent returns a free-text response that must be converted into structured WebSearchResult objects. Two parsing strategies are tried in order:

Strategy

How it works

When it succeeds

JSON extraction

Looks for a fenced ```json ... ``` block in the response and validates it against the GroundingWithBingResults schema.

When the agent follows the output format rule and returns valid JSON.

LLM fallback

Sends the raw response to the language_model with structured-output enforcement to produce a GroundingWithBingResults object.

When the agent returns useful content but not in the expected JSON format.

If both strategies fail, the search raises an error.

Platform Setup

Environment Variable

Required

Description

AZURE_AI_PROJECT_ENDPOINT

Yes

Azure AI project endpoint URL. Takes precedence over the space-level endpoint.

AZURE_AI_BING_RESOURCE_CONNECTION_STRING

Yes (auto)

Bing resource connection string from Azure. Used to configure the BingGroundingTool in auto-provisioned mode.

AZURE_AI_BING_AGENT_MODEL

Yes (auto)

The deployed model name used when creating/overriding the agent. No default -- must be explicitly set.

AZURE_AI_ASSISTANT_ID

Yes (pre-configured)

The ID of a pre-configured Foundry Agent. When set, auto-provisioning is skipped. Takes precedence over the space-level agent_id.

DEFAULT_AZURE_IDENTITY_CREDENTIAL_TYPE

No

Azure credential mode: default or workload (default: default). See Platform -- Azure Authentication.


Custom API Search Engine

Display name: Customized API Search Engine

Sends search queries to a user-defined REST endpoint. The endpoint must return results matching the WebSearchResults schema (a list of objects with url, title, snippet, and optionally content fields).

Space-Level Config

Setting

Type

Default

Description

fetch_size

integer

5

Number of results to fetch.

api_endpoint

string

<http://api.example.com>

URL of the custom search API. Hidden from UI if set via env var.

api_request_method

GET / POST

GET

HTTP method for the API request. Hidden from UI if set via env var.

api_headers

string (JSON)

{"Content-Type": "application/json"}

Request headers as a JSON string. Hidden from UI if set via env var.

api_additional_query_params

string (JSON)

{}

Additional query parameters as JSON. Hidden from UI if set via env var.

api_additional_body_params

string (JSON)

{}

Additional request body parameters as JSON. Hidden from UI if set via env var.

requires_scraping

boolean

false

Whether to additionally crawl result URLs.

timeout

integer

120

Request timeout in seconds.

For GET requests, the search query is added as a query parameter. For POST requests, it is included in the JSON body as {"query": "..."}.

Platform Setup

Environment Variable

Required

Description

CUSTOM_WEB_SEARCH_API_ENDPOINT

No

Default API endpoint. When set, the field is hidden from space-level config.

CUSTOM_WEB_SEARCH_API_METHOD

No

Default HTTP method (GET or POST). When set, the field is hidden from space-level config.

CUSTOM_WEB_SEARCH_API_HEADERS

No

Default headers (JSON string). When set, the field is hidden from space-level config.

CUSTOM_WEB_SEARCH_API_ADDITIONAL_QUERY_PARAMS

No

Default query params (JSON string). When set, the field is hidden from space-level config.

CUSTOM_WEB_SEARCH_API_ADDITIONAL_BODY_PARAMS

No

Default body params (JSON string). When set, the field is hidden from space-level config.

CUSTOM_WEB_SEARCH_API_CLIENT_CONFIG

No

HTTP client configuration (JSON string).


Scraping Behavior Summary

Engine

Requires Scraping

Notes

Google

Yes

Returns URLs only; crawler must fetch page content.

Bing

Configurable (default: No)

Azure AI Agent returns content with detailed answers and key facts per source.

Custom API

Configurable (default: No)

Depends on what the custom API returns.

Last updated