Search Engine Setup and Configuration

6 min read

This document describes how each search engine provider works, what setup is required, and what data flows between the platform and the external service.


How it works: Sends search queries to the Google Custom Search JSON API. Returns a list of URLs, titles, and snippets. Requires scraping -- a crawler must fetch the actual page content afterward.

Data flow:

  • Outbound: Search query, API key (via query parameter), Custom Search Engine ID.

  • Inbound: JSON response with search result items (link, title, snippet). Pagination is handled automatically (10 results per page).

Setup steps:

  1. Create a project in Google Cloud Console.

  2. Enable the Custom Search JSON API.

  3. Create an API key with appropriate restrictions.

  4. Create a Custom Search Engine at cse.google.com and note the Engine ID.

  5. Set the following environment variables in the assistants-core pod:

Environment Variable

Value

GOOGLE_SEARCH_API_KEY

Your Google API key

GOOGLE_SEARCH_ENGINE_ID

Your Custom Search Engine ID (e.g., 001111111xxxxxxf42)

GOOGLE_SEARCH_API_ENDPOINT

<https://customsearch.googleapis.com/customsearch/v1>

  1. Add "google" to the ACTIVE_SEARCH_ENGINES JSON array.


Bing (Grounding with Bing via Azure AI)

Unlike other search engines that call a search API directly, the Bing integration delegates the entire search to an Azure AI Foundry Agent with a BingGroundingTool attached. The agent autonomously searches Bing, reads each source, and produces a structured response with detailed answers and key facts. The tool then parses this response into standard search results.

Data flow:

  • Outbound: Azure identity token (via SDK), search query and generation instructions sent to the Azure AI Agent service. The agent internally calls the Bing API using the configured connection string -- the assistants-core pod does not call Bing directly.

  • Inbound: Agent thread response containing structured search results (URLs, titles, detailed answers, key facts per source).

Azure Authentication

The tool authenticates with Azure using the azure-identity SDK. The credential type is controlled by the DEFAULT_AZURE_IDENTITY_CREDENTIAL_TYPE environment variable:

Mode

Credential Class

How it works

Best for

default

DefaultAzureCredential

Iterates through a chain of credential providers in order: environment variables, managed identity, Azure CLI, Visual Studio Code, Azure PowerShell, and more. Uses the first one that succeeds.

Development environments, VMs with managed identity, most standard deployments.

workload

WorkloadIdentityCredential

Uses Kubernetes workload identity with federated credentials. The pod's service account is mapped to an Azure AD application.

AKS clusters with workload identity federation enabled.

On initialization, the tool validates the credential by requesting a token from the URL specified in DEFAULT_AZURE_IDENTITY_CREDENTIALS_VALIDATE_TOKEN_URL (default: <https://management.azure.com/.default).> If validation fails, the Bing engine reports itself as not configured.

Private Endpoint Transport

When UNIQUE_PRIVATE_ENDPOINT_TRANSPORT_ENABLED is set to true, both the credential provider and the AIProjectClient are created with a RequestsTransport using the system's certifi CA bundle for SSL verification. This is required in environments where Azure endpoints are accessed through Azure Private Link and the default SSL trust chain does not include the necessary certificates.

Automatic Agent Creation

The tool supports a zero-config mode where no agent_id needs to be provided in the space configuration. In this mode, the agent is managed automatically.

image-20260303-100049.png

Key details of auto-provisioning:

  • The agent is named UNIQUE_GROUNDING_WITH_BING_AGENT. If an agent with this name already exists in the project, it is reused.

  • The persisted agent is just a shell -- all behavior (model, tools, instructions) is overridden at run time on the create_thread_and_process_run call.

  • The BingGroundingTool is configured with the AZURE_AI_BING_RESOURCE_CONNECTION_STRING and the space-level fetch_size (number of Bing results to retrieve).

  • The instructions are composed of the space-level generation_instructions + a JSON output format rule that enforces the GroundingWithBingResults schema.

  • The model is set to AZURE_AI_BING_AGENT_MODEL.

When using a pre-configured agent (agent_id is provided via AZURE_AI_ASSISTANT_ID env var or space config), no overrides are applied. The agent's own model, tools, and instructions are used as configured in the Azure portal.

Setup -- Minimal (Auto-Provisioned)

This path requires only environment variables. No manual agent creation in Azure is needed.

  1. Create an Azure AI Project in the Azure portal.

  2. Set up a Bing resource and create a connection string in the Azure AI project.

  3. Deploy an agent-capable model (e.g., gpt-4o) in the Azure AI project.

  4. Ensure the assistants-core pod has a valid Azure identity (managed identity, workload identity, or service principal via environment variables).

  5. Set the following environment variables in the assistants-core pod:

Environment Variable

Required

Provisioned by Unique Platform

Value

AZURE_AI_PROJECT_ENDPOINT

Yes

Yes

Azure AI project endpoint URL

AZURE_AI_BING_RESOURCE_CONNECTION_STRING

Yes

Yes

Bing resource connection string from the Azure AI project

AZURE_AI_BING_AGENT_MODEL

Yes

Yes

The deployed model name to use for the agent (e.g., gpt-4o-deployment)

DEFAULT_AZURE_IDENTITY_CREDENTIAL_TYPE

No

Yes

default or workload (default: default)

  1. Add "bing" to the ACTIVE_SEARCH_ENGINES JSON array.

The tool will automatically create and manage the Foundry Agent. Space admins can control the generation instructions, fetch size, and fallback parser model from the Spaces UI.

Note: The Unique Platform provides a Terraform module that automates the creation of the required Azure resources -- the AI Foundry project, the Bing resource connection, and the agent model deployment. When this module is used, all variables marked "Provisioned by Unique Platform" are set automatically, making the auto-provisioned mode fully end-to-end with infrastructure as code.

Setup -- Pre-Configured Agent

This path is for deployments where the Foundry Agent is managed externally (e.g., created manually in the Azure portal or by a separate IaC pipeline). In this mode, only the agent ID and the project endpoint are needed -- the tool uses the agent as-is without any overrides.

  1. In the Azure AI portal, create a Microsoft Foundry Agent:

    • Attach a BingGroundingTool using the Bing resource connection string.

    • Configure the model, instructions, and any other agent settings as desired.

    • Note the agent ID.

  2. Set the following environment variables in the assistants-core pod:

Environment Variable

Required

Value

AZURE_AI_PROJECT_ENDPOINT

Yes

Azure AI project endpoint URL

AZURE_AI_ASSISTANT_ID

Yes

The ID of the pre-configured Foundry Agent

  1. Add "bing" to the ACTIVE_SEARCH_ENGINES JSON array.

Alternatively, agent_id and endpoint can be set per-space in the space configuration UI instead of (or in addition to) the environment variables. The env var AZURE_AI_ASSISTANT_ID takes precedence over the space-level agent_id, and AZURE_AI_PROJECT_ENDPOINT takes precedence over the space-level endpoint.

In this mode, the space-level generation_instructions and AZURE_AI_BING_AGENT_MODEL are ignored -- the agent uses its own configuration.

Environment Variables

Variable

Required

Default

Mode

Description

AZURE_AI_PROJECT_ENDPOINT

Yes

null

Both

Azure AI project endpoint URL. Takes precedence over the space-level endpoint.

AZURE_AI_BING_RESOURCE_CONNECTION_STRING

Yes (auto)

null

Auto-provisioned

Bing resource connection string. Used to configure the BingGroundingTool.

AZURE_AI_BING_AGENT_MODEL

Yes (auto)

--

Auto-provisioned

The deployed model name used when creating/overriding the agent. No default -- must be explicitly set.

AZURE_AI_ASSISTANT_ID

Yes (pre-configured)

null

Pre-configured

The ID of a pre-configured Foundry Agent. When set, auto-provisioning is skipped and no overrides are applied. Takes precedence over the space-level agent_id.

DEFAULT_AZURE_IDENTITY_CREDENTIAL_TYPE

No

default

Both

Azure credential type: default (DefaultAzureCredential chain) or workload (WorkloadIdentityCredential).

DEFAULT_AZURE_IDENTITY_CREDENTIALS_VALIDATE_TOKEN_URL

No

<https://management.azure.com/.default>

Both

Token URL used to validate credentials at startup.

UNIQUE_PRIVATE_ENDPOINT_TRANSPORT_ENABLED

No

false

Both

Enable custom transport with certifi CA bundle for Azure Private Link environments.


Custom API

How it works: Sends search queries to a user-defined REST endpoint. For GET requests, the query is sent as a query parameter. For POST requests, it is sent in the JSON body as {"query": "..."}. The endpoint must return a JSON response matching the WebSearchResults schema:

json
{
  "results": [
    {
      "url": "https://example.com/page",
      "title": "Page Title",
      "snippet": "A short description",
      "content": "Full page content (optional)"
    }
  ]
}

Data flow:

  • Outbound: Search query, custom headers, additional query/body parameters.

  • Inbound: JSON response with results matching the schema above.

Setup steps:

  1. Deploy or identify a REST API that accepts search queries and returns results in the expected format.

  2. Set the following environment variables in the assistants-core pod (all optional -- can also be configured at the space level):

Environment Variable

Value

CUSTOM_WEB_SEARCH_API_ENDPOINT

API endpoint URL

CUSTOM_WEB_SEARCH_API_METHOD

GET or POST

CUSTOM_WEB_SEARCH_API_HEADERS

JSON string with request headers

CUSTOM_WEB_SEARCH_API_ADDITIONAL_QUERY_PARAMS

JSON string with additional query parameters

CUSTOM_WEB_SEARCH_API_ADDITIONAL_BODY_PARAMS

JSON string with additional body parameters

CUSTOM_WEB_SEARCH_API_CLIENT_CONFIG

JSON string with HTTP client configuration

  1. Add "custom_api" to the ACTIVE_SEARCH_ENGINES JSON array.

Note: When environment variables are set for the Custom API, those fields are hidden from the space-level configuration UI. This allows platform admins to lock down the endpoint configuration while still allowing space admins to adjust other settings.

Last updated