Web Search Module

4 min read

Motivation

The Web Search module enables real-time access to internet information through a chat interface. By integrating live web searches directly into conversations, users receive accurate and up-to-date answers to their questions, helping them stay informed, make better decisions, and solve problems using the latest available data.

Functionality

This module extends the language model with web search capabilities, allowing it to respond to queries that exceed its built-in knowledge, such as current events or other time-sensitive topics.

When a user submits a question, the model follows this decision flow:

Use of Built-in Knowledge:
The model first checks whether it can answer using its internal knowledge. If the information is sufficient, it responds directly—no web search or external sources are used.
Triggering a Web Search:
If the model determines that up-to-date or external information is needed, it automatically:
- Generate a Search Query
  The module creates a search query based on the full conversation history, optimized for effective use with search engines.
- Perform a Web Search
  It uses the configured search engine (e.g., Google or Bing) to retrieve relevant web pages.
- Extract Web Page Content
  The HTML content of the top-ranked search results is collected and processed.
- Split the Content
  Extracted text is divided into smaller chunks to make it easier to search and understand
- [Optional] Run Similarity Search
  If enabled, the text chunks are embedded and compared with the generated query to identify the most relevant information.
- Answer the User
  The most relevant content is included in the context, allowing the model to generate a well-informed response to the user’s question.

Supported Search Engines

Unique supports the Google Custom Search API engine. Details on configuration, as well as security and privacy considerations, can be found here: Web Search Administration

Google allows you to restrict search results to specific domains. This means users can configure a list of trusted sources, ensuring the search engine only returns results from those domains. For more details, see below in the configuration section.

FAQ

When I set-up the WebSearch Module in a space, should I also include the Chat with GPT?

It is not needed to include Chat with GPT as an additional Module. The WebSearch has two “modes”: either the models decides if it can answer the question based on the model itself (eg. some translations) or needs to go to the Web for more details OR it always tries to go to the Web to try to find the answres.

Can I also have access to Sites behind a paywall or that need a login?

No it is not possible to access sites that are behind a paywall and need a login to read the content?

Does the module only retrieve the static part of the pages. Content coming from dynamic scripts cannot be retrieved.

Yes

Reference in Code in AI Module Template

WebSearch

Configuration settings

The WebSearchConfig schema is designed to configure various aspects of a web-based question answering system. This configuration includes templates for system prompts, trigger prompts, and settings related to search engine operations. The schema provides a structured approach to define how the system interacts with users and processes queries using web search capabilities. It includes settings for token limits, interaction history, and various configurations for search engine components.

Default Configuration

json

{
    "languageModel": "AZURE_GPT_4o_2024_1120",
    "questionAnsweringSystemPromptTemplate": "You are helping the employees with their questions. You will find below a question and some sources extracted from webpages (they are delimited with XML tags).\n\nAnswer the employee's question using ONLY facts from the sources or past conversation. Information helping the employee's question can also be added.\n\nIf not specified, format the answer using an introduction followed by a list of bullet points. The facts you add should ALWAYS help answering the question.\n\nSTRICTLY reference each fact you use. A fact is preferably referenced by ONLY ONE source e.g [sourceX].\n\nHere is an example on how to reference sources (referenced facts must STRICTLY match the source number):\n- Some information retrieved from source N°X.[sourceX]\n- Some information retrieved from source N°Y and some information retrieved from source N°Z.[sourceY][sourceZ]\n\nMake sure to respect the markdown link format and wrap each link reference between parenthesis.\n\nCurrent date: $current_date.\n",
    "questionAnsweringTriggerPromptTemplate": "sources:\n```\n$content\n```\n\nquestion:\n```\n$query\n```\n\nAnswer concisely in $language (same language as question) and ALWAYS reference each of your facts:",
    "toolSearchSystemPromptTemplate": "You are ChatGPT, a large language model trained by OpenAI.\n\nKnowledge cutoff: $info_cutoff_at.\nCurrent date: $current_date.\n\nYou have access to information in the web, via the `$web_search` tool. If this tool is helpful to answer the question, you can use it to find relevant information.\n",
    "toolSearchTriggerPromptTemplate": "$query",
    "limitTokenSources": 10000,
    "numberHistoryInteractionsIncluded": 6,
    "searchEngineConfig": {
        "searchEngineName": "Google",
        "fetchSize": 10,
        "urlPatternBlacklist": [
            ".*\\.pdf$"
        ],
        "bannedDomains": [],  
        "customSearchConfig": {
            "cx": null,
            "c2Coff": null,
            "cr": null,
            "exactTerms": null,
            "excludeTerms": null,
            "fileType": null,
            "filter": null,
            "gl": null,
            "highRange": null,
            "hl": null,
            "hq": null,
            "linkSite": null,
            "lowRange": null,
            "lr": null,
            "orTerms": null,
            "rights": null,
            "safe": "active",
            "searchType": null,
            "siteSearch": null,
            "siteSearchFilter": null,
            "sort": null
        }
    },
    "queryGenerationConfig": {
        "skipQueryGeneration": false,
        "queryInstruction": "The user's search query, optimized for search engines, incorporates relevant details from the conversation, especially if it is a follow-up question. Always use the user's language message."
    },
    "crawlerConfig": {
        "crawlerType": "BasicCrawler",
        "semaphoreCount": 10,
        "timeout": 5.0,
        "cleaningStrategyConfig": {
            "markdownCleaningTimeout": 5.0,
            "removeNestedImagesAndLinks": true,
            "removeSimpleImagesAndLinks": true,
            "removeMultipleLinebreaks": true,
            "removeRepeatingPatterns": false
        }
    },
    "contentAdapterConfig": {
        "chunkSize": 1000,
        "chunkingMaxWorkers": 10,
        "contentProcessingStrategyConfig": {
            "strategy": "truncate",
            "truncateToMaxTokens": 5000
        }
    },
    "chunkRelevancySortConfig": {
        "enabled": false,
        "relevancyLevelsToConsider": [
            "high",
            "medium",
            "low"
        ],
        "relevancyLevelOrder": {
            "high": 0,
            "medium": 1,
            "low": 2
        },
        "languageModel": "AZURE_GPT_35_TURBO_0125",
        "fallbackLanguageModel": "AZURE_GPT_35_TURBO_0125",
        "additionalLlmOptions": {},
        "structuredOutputConfig": {
            "enabled": false,
            "extract_fact_list": false,
            "reason_description": "A brief explanation justifying your evaluation decision.",
            "value_description": "Assessment of how relevant the facts are to the query. Must be one of: ['low', 'medium', 'high'].",
            "fact_description": "A fact is an information that is directly answers the user's query. Make sure to emphasize the important information from the fact with bold text.",
            "fact_list_description": "A list of relevant facts extracted from the source that supports or answers the user's query."
        },
        "maxTasks": null
    }
}

General parameters

Field Name	Type	Description	Default Value
`languageModel`	str \| Object	Specifies the language model to use, can be a string or detailed object. Language Model Info	`AZURE_GPT_4o_2024_1120`
`questionAnsweringSystemPromptTemplate`	string	Template for the system prompt in question answering.	See Default Configuration above
`questionAnsweringTriggerPromptTemplate`	string	Template for the trigger prompt in question answering.	See Default Configuration above
`toolSearchSystemPromptTemplate`	string	Template for the system prompt to determine whether a web search should be performed.	See Default Configuration above
`toolSearchTriggerPromptTemplate`	string	Template for the trigger prompt to determine whether a web search should be performed.	See Default Configuration above
`limitTokenSources`	integer	Token Source Limit	`10000`
`numberHistoryInteractionsIncluded`	integer	Number of history interactions included	`6`
`searchEngineConfig`	object	Search Engine Configuration Search Engine Configuration	See Default Configuration above
`queryGenerationConfig`	object	Query Generation Configuration Search Engine Configuration	See below
`crawlerConfig`	object	Define Crawler to scrap content from web pages Crawler Setup and Configuration	See Default Configuration above
`contentAdapterConfig`	object	Post Processing of WebPage Contents and transform them to chunks Content Processing Configuration	See Default Configuration above
`chunkRelevancySortConfig`	object	Reranking relevant chunks with LLM Chunk Relevancy Sort	See Default Configuration above

Query Generation

Field Name	Type	Description	Default Value
`skipQueryGeneration`	bool	Determines whether to use the LLM to formulate a search query or if you want the user message to be sent as is to the search engine	`false`
`queryInstruction`	string	The instruction to be given to the LLM in order to generate the search query.	See Default Configuration above

Configuration JSON Schema

json

{
    "$defs": {
        "BasicCrawlerConfig": {
            "properties": {
                "crawlerType": {
                    "const": "BasicCrawler",
                    "default": "BasicCrawler",
                    "title": "Crawler Type",
                    "type": "string"
                },
                "semaphoreCount": {
                    "default": 10,
                    "description": "The number of concurrent requests to make to the same domain.",
                    "title": "Semaphore Count",
                    "type": "integer"
                },
                "timeout": {
                    "default": 5.0,
                    "description": "The timeout for the requests in seconds. This applies to the overall request timeout including connect, read, and write operations.",
                    "title": "Timeout",
                    "type": "number"
                },
                "cleaningStrategyConfig": {
                    "$ref": "#/$defs/CleaningStrategyConfig",
                    "description": "The cleaning strategy configuration."
                }
            },
            "title": "Basic Crawler Config",
            "type": "object"
        },
        "CacheMode": {
            "description": "Defines the caching behavior for web crawling operations.\n\nModes:\n- ENABLED: Normal caching behavior (read and write)\n- DISABLED: No caching at all\n- READ_ONLY: Only read from cache, don't write\n- WRITE_ONLY: Only write to cache, don't read\n- BYPASS: Bypass cache for this operation",
            "enum": [
                "enabled",
                "disabled",
                "read_only",
                "write_only",
                "bypass"
            ],
            "title": "CacheMode",
            "type": "string"
        },
        "ChunkRelevancySortConfig": {
            "properties": {
                "enabled": {
                    "default": false,
                    "description": "Whether to enable the chunk relevancy sort.",
                    "title": "Enabled",
                    "type": "boolean"
                },
                "relevancyLevelsToConsider": {
                    "default": [
                        "high",
                        "medium",
                        "low"
                    ],
                    "description": "The relevancy levels to consider.",
                    "items": {
                        "type": "string"
                    },
                    "title": "Relevancylevelstoconsider",
                    "type": "array"
                },
                "relevancyLevelOrder": {
                    "additionalProperties": {
                        "type": "integer"
                    },
                    "default": {
                        "high": 0,
                        "medium": 1,
                        "low": 2
                    },
                    "description": "The relevancy level order.",
                    "title": "Relevancylevelorder",
                    "type": "object"
                },
                "languageModel": {
                    "anyOf": [
                        {
                            "$ref": "#/$defs/LanguageModelName"
                        },
                        {
                            "$ref": "#/$defs/LanguageModelInfo"
                        }
                    ],
                    "default": "AZURE_GPT_35_TURBO_0125",
                    "description": "The language model to use for the chunk relevancy sort."
                },
                "fallbackLanguageModel": {
                    "anyOf": [
                        {
                            "$ref": "#/$defs/LanguageModelName"
                        },
                        {
                            "$ref": "#/$defs/LanguageModelInfo"
                        }
                    ],
                    "default": "AZURE_GPT_35_TURBO_0125",
                    "description": "The language model to use as a fallback."
                },
                "additionalLlmOptions": {
                    "additionalProperties": true,
                    "default": {},
                    "description": "Additional options to pass to the language model.",
                    "title": "Additionalllmoptions",
                    "type": "object"
                },
                "structuredOutputConfig": {
                    "$ref": "#/$defs/StructuredOutputConfig",
                    "description": "The configuration for the structured output."
                },
                "maxTasks": {
                    "default": null,
                    "title": "Maxtasks",
                    "type": "integer"
                }
            },
            "title": "ChunkRelevancySortConfig",
            "type": "object"
        },
        "CleaningStrategyConfig": {
            "properties": {
                "markdownCleaningTimeout": {
                    "default": 5.0,
                    "description": "Timeout for markdown cleaning.",
                    "title": "Markdown Cleaning Timeout",
                    "type": "number"
                },
                "removeNestedImagesAndLinks": {
                    "default": true,
                    "description": "Whether to clean nested images and links in the content.",
                    "title": "Remove Nested Images And Links",
                    "type": "boolean"
                },
                "removeSimpleImagesAndLinks": {
                    "default": true,
                    "description": "Whether to clean simple images and links in the content.",
                    "title": "Remove Simple Images And Links",
                    "type": "boolean"
                },
                "removeMultipleLinebreaks": {
                    "default": true,
                    "description": "Whether to clean multiple linebreaks in the content.",
                    "title": "Remove Multiple Linebreaks",
                    "type": "boolean"
                },
                "removeRepeatingPatterns": {
                    "default": false,
                    "description": "Whether to clean repeating patterns in the content.",
                    "title": "Remove Repeating Patterns",
                    "type": "boolean"
                }
            },
            "title": "Cleaning Strategy Config",
            "type": "object"
        },
        "ContentAdapterConfig": {
            "properties": {
                "chunkSize": {
                    "default": 1000,
                    "description": "Number of chunks to split the search results",
                    "title": "Chunk Size",
                    "type": "integer"
                },
                "chunkingMaxWorkers": {
                    "default": 10,
                    "description": "Number of workers to embed search results",
                    "title": "Chunking Max Workers",
                    "type": "integer"
                },
                "contentProcessingStrategyConfig": {
                    "description": "The strategy to use for content processing",
                    "discriminator": {
                        "mapping": {
                            "summarize": "#/$defs/SummarizeWebpageConfig",
                            "truncate": "#/$defs/TruncatePageToMaxTokensConfig"
                        },
                        "propertyName": "strategy"
                    },
                    "oneOf": [
                        {
                            "$ref": "#/$defs/SummarizeWebpageConfig"
                        },
                        {
                            "$ref": "#/$defs/TruncatePageToMaxTokensConfig"
                        }
                    ],
                    "title": "Content Processing Strategy Config"
                }
            },
            "title": "Content Adapter Config",
            "type": "object"
        },
        "Crawl4AiCrawlerConfig": {
            "properties": {
                "crawlerType": {
                    "const": "Crawl4AiCrawler",
                    "default": "Crawl4AiCrawler",
                    "title": "Crawler Type",
                    "type": "string"
                },
                "semaphoreCount": {
                    "default": 10,
                    "description": "The number of concurrent requests to make to the same domain.",
                    "title": "Semaphore Count",
                    "type": "integer"
                },
                "timeout": {
                    "default": 5.0,
                    "description": "The timeout for the requests in seconds. This applies to the overall request timeout including connect, read, and write operations.",
                    "title": "Timeout",
                    "type": "number"
                },
                "cleaningStrategyConfig": {
                    "$ref": "#/$defs/CleaningStrategyConfig",
                    "description": "The cleaning strategy configuration."
                },
                "maxSessionPermit": {
                    "default": 10,
                    "description": "The maximum number of sessions to make to the same domain.",
                    "title": "Max Session Permit",
                    "type": "integer"
                },
                "markdownGeneratorConfig": {
                    "$ref": "#/$defs/MarkdownGeneratorConfig",
                    "description": "The markdown generator configuration"
                },
                "rateLimiterConfig": {
                    "$ref": "#/$defs/RateLimiterConfig",
                    "description": "The rate limiter configuration"
                },
                "crawlerConfig": {
                    "$ref": "#/$defs/CrawlerConfig",
                    "description": "The crawler configuration"
                },
                "pruningContentFilterConfig": {
                    "$ref": "#/$defs/PruningContentFilterConfig",
                    "description": "The pruning content filter configuration"
                }
            },
            "title": "Crawl4 Ai Crawler Config",
            "type": "object"
        },
        "CrawlerConfig": {
            "properties": {
                "cacheMode": {
                    "$ref": "#/$defs/CacheMode",
                    "default": "bypass",
                    "description": "The cache mode",
                    "title": "Cache Mode"
                },
                "scanFullPage": {
                    "default": false,
                    "description": "Whether to scan the full page",
                    "title": "Scan Full Page",
                    "type": "boolean"
                },
                "waitUntil": {
                    "default": "domcontentloaded",
                    "description": "The condition to wait for when navigating",
                    "title": "Wait Until",
                    "type": "string"
                },
                "scrollDelay": {
                    "default": 0.05,
                    "description": "The delay to scroll the page",
                    "title": "Scroll Delay",
                    "type": "number"
                },
                "removeOverlayElements": {
                    "default": true,
                    "description": "Whether to remove the overlay elements",
                    "title": "Remove Overlay Elements",
                    "type": "boolean"
                },
                "simulateUser": {
                    "default": true,
                    "description": "Whether to simulate the user",
                    "title": "Simulate User",
                    "type": "boolean"
                },
                "overrideNavigator": {
                    "default": true,
                    "description": "Whether to override the navigator",
                    "title": "Override Navigator",
                    "type": "boolean"
                }
            },
            "title": "Crawler Config",
            "type": "object"
        },
        "EncoderName": {
            "enum": [
                "o200k_base",
                "cl100k_base"
            ],
            "title": "EncoderName",
            "type": "string"
        },
        "GoogleConfig": {
            "properties": {
                "searchEngineName": {
                    "const": "Google",
                    "default": "Google",
                    "title": "Search Engine Name",
                    "type": "string"
                },
                "fetchSize": {
                    "default": 10,
                    "description": "Number of search results to fetch",
                    "title": "Fetch Size",
                    "type": "integer"
                },
                "urlPatternBlacklist": {
                    "default": [
                        ".*\\.pdf$"
                    ],
                    "description": "List of URL patterns to blacklist",
                    "items": {
                        "type": "string"
                    },
                    "title": "Url Pattern Blacklist",
                    "type": "array"
                },
                "bannedDomains": {
                    "default": [],
                    "description": "List of banned domains",
                    "items": {
                        "type": "string"
                    },
                    "title": "Banned Domains",
                    "type": "array"
                },
                "customSearchConfig": {
                    "$ref": "#/$defs/GoogleSearchOptionalQueryParams",
                    "default": {
                        "cx": null,
                        "c2Coff": null,
                        "cr": null,
                        "exactTerms": null,
                        "excludeTerms": null,
                        "fileType": null,
                        "filter": null,
                        "gl": null,
                        "highRange": null,
                        "hl": null,
                        "hq": null,
                        "linkSite": null,
                        "lowRange": null,
                        "lr": null,
                        "orTerms": null,
                        "rights": null,
                        "safe": "active",
                        "searchType": null,
                        "siteSearch": null,
                        "siteSearchFilter": null,
                        "sort": null
                    },
                    "title": "Custom Search Config"
                }
            },
            "title": "Google Config",
            "type": "object"
        },
        "GoogleSearchOptionalQueryParams": {
            "description": "Optional Google Custom Search API query parameters.\nBased on the official Google Custom Search JSON API documentation.",
            "properties": {
                "cx": {
                    "default": null,
                    "description": "The Programmable Search Engine ID to use for this request. If not provided, the default Programmable Search Engine ID will be used.",
                    "title": "Cx",
                    "type": "string"
                },
                "c2Coff": {
                    "default": null,
                    "description": "Enables or disables Simplified and Traditional Chinese Search. 0: Enabled (default), 1: Disabled",
                    "enum": [
                        "0",
                        "1"
                    ],
                    "title": "C2Coff",
                    "type": "string"
                },
                "cr": {
                    "default": null,
                    "description": "Restricts search results to documents originating in a particular country",
                    "title": "Cr",
                    "type": "string"
                },
                "exactTerms": {
                    "default": null,
                    "description": "Identifies a phrase that all documents in the search results must contain",
                    "title": "Exact Terms",
                    "type": "string"
                },
                "excludeTerms": {
                    "default": null,
                    "description": "Identifies a word or phrase that should not appear in any documents in the search results",
                    "title": "Exclude Terms",
                    "type": "string"
                },
                "fileType": {
                    "default": null,
                    "description": "Restricts results to files of a specified extension",
                    "title": "File Type",
                    "type": "string"
                },
                "filter": {
                    "default": null,
                    "description": "Controls turning on or off the duplicate content filter. 0: Turns off, 1: Turns on",
                    "enum": [
                        "0",
                        "1"
                    ],
                    "title": "Filter",
                    "type": "string"
                },
                "gl": {
                    "default": null,
                    "description": "Geolocation of end user. Two-letter country code",
                    "title": "Gl",
                    "type": "string"
                },
                "highRange": {
                    "default": null,
                    "description": "Specifies the ending value for a search range",
                    "title": "High Range",
                    "type": "string"
                },
                "hl": {
                    "default": null,
                    "description": "Sets the user interface language",
                    "title": "Hl",
                    "type": "string"
                },
                "hq": {
                    "default": null,
                    "description": "Appends the specified query terms to the query, as if they were combined with a logical AND operator",
                    "title": "Hq",
                    "type": "string"
                },
                "linkSite": {
                    "default": null,
                    "description": "Specifies that all search results should contain a link to a particular URL",
                    "title": "Link Site",
                    "type": "string"
                },
                "lowRange": {
                    "default": null,
                    "description": "Specifies the starting value for a search range",
                    "title": "Low Range",
                    "type": "string"
                },
                "lr": {
                    "default": null,
                    "description": "Restricts the search to documents written in a particular language (e.g., lr=lang_ja)",
                    "title": "Lr",
                    "type": "string"
                },
                "orTerms": {
                    "default": null,
                    "description": "Provides additional search terms to check for in a document",
                    "title": "Or Terms",
                    "type": "string"
                },
                "rights": {
                    "default": null,
                    "description": "Filters based on licensing. Supported values include: cc_publicdomain, cc_attribute, cc_sharealike, cc_noncommercial, cc_nonderived",
                    "title": "Rights",
                    "type": "string"
                },
                "safe": {
                    "$ref": "#/$defs/Safe",
                    "default": "active",
                    "description": "Search safety level"
                },
                "searchType": {
                    "$ref": "#/$defs/SearchType",
                    "default": null,
                    "description": "Specifies the search type: image. If unspecified, results are limited to webpages",
                    "title": "Search Type"
                },
                "siteSearch": {
                    "default": null,
                    "description": "Specifies a given site which should always be included or excluded from results",
                    "title": "Site Search",
                    "type": "string"
                },
                "siteSearchFilter": {
                    "$ref": "#/$defs/SiteSearchFilter",
                    "default": null,
                    "description": "Controls whether to include or exclude results from the site named in the siteSearch parameter",
                    "title": "Site Search Filter"
                },
                "sort": {
                    "default": null,
                    "description": "The sort expression to apply to the results. Example value: date",
                    "title": "Sort",
                    "type": "string"
                }
            },
            "title": "Google Search Optional Query Params",
            "type": "object"
        },
        "LanguageModelInfo": {
            "properties": {
                "name": {
                    "anyOf": [
                        {
                            "$ref": "#/$defs/LanguageModelName"
                        },
                        {
                            "type": "string"
                        }
                    ],
                    "title": "Name"
                },
                "version": {
                    "title": "Version",
                    "type": "string"
                },
                "provider": {
                    "$ref": "#/$defs/LanguageModelProvider"
                },
                "encoder_name": {
                    "$ref": "#/$defs/EncoderName",
                    "default": "cl100k_base"
                },
                "token_limits": {
                    "$ref": "#/$defs/LanguageModelTokenLimits",
                    "default": {
                        "token_limit_input": 7000,
                        "token_limit_output": 1000
                    }
                },
                "capabilities": {
                    "default": [
                        "streaming"
                    ],
                    "items": {
                        "$ref": "#/$defs/ModelCapabilities"
                    },
                    "title": "Capabilities",
                    "type": "array"
                },
                "info_cutoff_at": {
                    "default": null,
                    "format": "date",
                    "title": "Info Cutoff At",
                    "type": "string"
                },
                "published_at": {
                    "default": null,
                    "format": "date",
                    "title": "Published At",
                    "type": "string"
                },
                "retirement_at": {
                    "default": null,
                    "format": "date",
                    "title": "Retirement At",
                    "type": "string"
                },
                "deprecated_at": {
                    "default": null,
                    "format": "date",
                    "title": "Deprecated At",
                    "type": "string"
                },
                "retirement_text": {
                    "default": null,
                    "title": "Retirement Text",
                    "type": "string"
                }
            },
            "required": [
                "name",
                "version",
                "provider"
            ],
            "title": "LanguageModelInfo",
            "type": "object"
        },
        "LanguageModelName": {
            "enum": [
                "AZURE_GPT_35_TURBO_0125",
                "AZURE_GPT_4_0613",
                "AZURE_GPT_4_32K_0613",
                "AZURE_GPT_4_TURBO_2024_0409",
                "AZURE_GPT_4o_2024_0513",
                "AZURE_GPT_4o_2024_0806",
                "AZURE_GPT_4o_2024_1120",
                "AZURE_GPT_4o_MINI_2024_0718",
                "AZURE_o1_MINI_2024_0912",
                "AZURE_o1_2024_1217",
                "AZURE_o3_MINI_2025_0131",
                "AZURE_GPT_45_PREVIEW_2025_0227",
                "AZURE_GPT_41_2025_0414",
                "AZURE_GPT_41_MINI_2025_0414",
                "AZURE_GPT_41_NANO_2025_0414",
                "AZURE_o3_2025_0416",
                "AZURE_o4_MINI_2025_0416",
                "litellm:anthropic-claude-3-7-sonnet",
                "litellm:anthropic-claude-3-7-sonnet-thinking",
                "litellm:anthropic-claude-sonnet-4",
                "litellm:anthropic-claude-opus-4",
                "litellm:gemini-2-0-flash",
                "litellm:gemini-2-5-flash",
                "litellm:gemini-2-5-flash-lite-preview-06-17",
                "litellm:gemini-2-5-flash-preview-04-17",
                "litellm:gemini-2-5-flash-preview-05-20",
                "litellm:gemini-2-5-pro",
                "litellm:gemini-2-5-pro-exp-03-25",
                "litellm:gemini-2-5-pro-preview-06-05"
            ],
            "title": "LanguageModelName",
            "type": "string"
        },
        "LanguageModelProvider": {
            "enum": [
                "AZURE",
                "CUSTOM",
                "LITELLM"
            ],
            "title": "LanguageModelProvider",
            "type": "string"
        },
        "LanguageModelTokenLimits": {
            "properties": {
                "token_limit_input": {
                    "title": "Token Limit Input",
                    "type": "integer"
                },
                "token_limit_output": {
                    "title": "Token Limit Output",
                    "type": "integer"
                }
            },
            "required": [
                "token_limit_input",
                "token_limit_output"
            ],
            "title": "LanguageModelTokenLimits",
            "type": "object"
        },
        "MarkdownGeneratorConfig": {
            "properties": {
                "options": {
                    "additionalProperties": true,
                    "default": {
                        "ignore_links": true,
                        "ignore_emphasis": true,
                        "ignore_images": true
                    },
                    "description": "The options for the markdown generator",
                    "title": "Options",
                    "type": "object"
                }
            },
            "title": "Markdown Generator Config",
            "type": "object"
        },
        "ModelCapabilities": {
            "enum": [
                "function_calling",
                "parallel_function_calling",
                "reproducible_output",
                "structured_output",
                "vision",
                "streaming",
                "reasoning"
            ],
            "title": "ModelCapabilities",
            "type": "string"
        },
        "NoCrawlerConfig": {
            "properties": {
                "crawlerType": {
                    "const": "NoCrawler",
                    "default": "NoCrawler",
                    "title": "Crawler Type",
                    "type": "string"
                },
                "semaphoreCount": {
                    "default": 10,
                    "description": "The number of concurrent requests to make to the same domain.",
                    "title": "Semaphore Count",
                    "type": "integer"
                },
                "timeout": {
                    "default": 5.0,
                    "description": "The timeout for the requests in seconds. This applies to the overall request timeout including connect, read, and write operations.",
                    "title": "Timeout",
                    "type": "number"
                },
                "cleaningStrategyConfig": {
                    "$ref": "#/$defs/CleaningStrategyConfig",
                    "description": "The cleaning strategy configuration."
                }
            },
            "title": "No Crawler Config",
            "type": "object"
        },
        "PruningContentFilterConfig": {
            "properties": {
                "enabled": {
                    "default": true,
                    "description": "Whether to enable the content filter",
                    "title": "Enabled",
                    "type": "boolean"
                },
                "threshold": {
                    "default": 0.5,
                    "description": "The threshold for the content filter",
                    "title": "Threshold",
                    "type": "number"
                },
                "thresholdType": {
                    "default": "fixed",
                    "description": "The type of threshold",
                    "enum": [
                        "fixed",
                        "dynamic"
                    ],
                    "title": "Threshold Type",
                    "type": "string"
                },
                "minWordThreshold": {
                    "default": 10,
                    "description": "The minimum number of words to keep",
                    "title": "Min Word Threshold",
                    "type": "integer"
                }
            },
            "title": "Pruning Content Filter Config",
            "type": "object"
        },
        "QueryGenerationConfig": {
            "properties": {
                "skipQueryGeneration": {
                    "default": false,
                    "description": "Skip query generation",
                    "title": "Skip Query Generation",
                    "type": "boolean"
                },
                "queryInstruction": {
                    "default": "The user's search query, optimized for search engines, incorporates relevant details from the conversation, especially if it is a follow-up question. Always use the user's language message.",
                    "description": "Instruction of the query parameter",
                    "title": "Query Instruction",
                    "type": "string"
                }
            },
            "title": "Query Generation Config",
            "type": "object"
        },
        "RateLimiterConfig": {
            "properties": {
                "baseDelay": {
                    "default": [
                        0.5,
                        1.0
                    ],
                    "description": "The range for a random delay (in seconds) between consecutive requests to the same domain.",
                    "maxItems": 2,
                    "minItems": 2,
                    "prefixItems": [
                        {
                            "type": "number"
                        },
                        {
                            "type": "number"
                        }
                    ],
                    "title": "Base Delay",
                    "type": "array"
                },
                "maxDelay": {
                    "default": 1.0,
                    "description": "The maximum allowable delay when rate-limiting errors occur",
                    "title": "Max Delay",
                    "type": "number"
                },
                "maxRetries": {
                    "default": 0,
                    "description": "The maximum number of retries to make when rate-limiting errors occur",
                    "title": "Max Retries",
                    "type": "integer"
                },
                "rateLimitCodes": {
                    "default": [
                        429,
                        503
                    ],
                    "description": "The HTTP status codes that indicate rate-limiting errors",
                    "items": {
                        "type": "integer"
                    },
                    "title": "Rate Limit Codes",
                    "type": "array"
                }
            },
            "title": "Rate Limiter Config",
            "type": "object"
        },
        "Safe": {
            "enum": [
                "active",
                "off"
            ],
            "title": "Safe",
            "type": "string"
        },
        "SearchType": {
            "enum": [
                "image"
            ],
            "title": "SearchType",
            "type": "string"
        },
        "SiteSearchFilter": {
            "enum": [
                "e",
                "i"
            ],
            "title": "SiteSearchFilter",
            "type": "string"
        },
        "StructuredOutputConfig": {
            "properties": {
                "enabled": {
                    "default": false,
                    "description": "Whether to use structured output for the evaluation.",
                    "title": "Enabled",
                    "type": "boolean"
                },
                "extract_fact_list": {
                    "default": false,
                    "description": "Whether to extract a list of relevant facts from context chunks with structured output.",
                    "title": "Extract Fact List",
                    "type": "boolean"
                },
                "reason_description": {
                    "default": "A brief explanation justifying your evaluation decision.",
                    "description": "The description of the reason field for structured output.",
                    "title": "Reason Description",
                    "type": "string"
                },
                "value_description": {
                    "default": "Assessment of how relevant the facts are to the query. Must be one of: ['low', 'medium', 'high'].",
                    "description": "The description of the value field for structured output.",
                    "title": "Value Description",
                    "type": "string"
                },
                "fact_description": {
                    "default": "A fact is an information that is directly answers the user's query. Make sure to emphasize the important information from the fact with bold text.",
                    "description": "The description of the fact field for structured output.",
                    "title": "Fact Description",
                    "type": "string"
                },
                "fact_list_description": {
                    "default": "A list of relevant facts extracted from the source that supports or answers the user's query.",
                    "description": "The description of the fact list field for structured output.",
                    "title": "Fact List Description",
                    "type": "string"
                }
            },
            "title": "StructuredOutputConfig",
            "type": "object"
        },
        "SummarizeWebpageConfig": {
            "properties": {
                "strategy": {
                    "const": "summarize",
                    "default": "summarize",
                    "title": "Strategy",
                    "type": "string"
                },
                "preTruncateToMaxTokens": {
                    "default": 30000,
                    "description": "Max number of tokens to truncate the page to keep before summarization",
                    "title": "Pre Truncate To Max Tokens",
                    "type": "integer"
                },
                "minTokensTriggerSummarization": {
                    "default": 5000,
                    "description": "Min number of tokens to trigger summarization",
                    "title": "Min Tokens Trigger Summarization",
                    "type": "integer"
                },
                "summarizationSystemPrompt": {
                    "default": "You are a helping assistant that generates query focused summarization of a webpage content. The summary should convey any information that is relevant to the query.",
                    "description": "The system prompt to use for summarization",
                    "title": "Summarization System Prompt",
                    "type": "string"
                }
            },
            "title": "Summarize Webpage Config",
            "type": "object"
        },
        "TruncatePageToMaxTokensConfig": {
            "properties": {
                "strategy": {
                    "const": "truncate",
                    "default": "truncate",
                    "title": "Strategy",
                    "type": "string"
                },
                "truncateToMaxTokens": {
                    "default": 5000,
                    "description": "Max number of tokens to truncate the page to",
                    "title": "Truncate To Max Tokens",
                    "type": "integer"
                }
            },
            "title": "Truncate Page To Max Tokens Config",
            "type": "object"
        }
    },
    "properties": {
        "languageModel": {
            "anyOf": [
                {
                    "$ref": "#/$defs/LanguageModelName"
                },
                {
                    "$ref": "#/$defs/LanguageModelInfo"
                }
            ],
            "default": "AZURE_GPT_4o_2024_1120",
            "title": "Language Model"
        },
        "questionAnsweringSystemPromptTemplate": {
            "default": "You are helping the employees with their questions. You will find below a question and some sources extracted from webpages (they are delimited with XML tags).\n\nAnswer the employee's question using ONLY facts from the sources or past conversation. Information helping the employee's question can also be added.\n\nIf not specified, format the answer using an introduction followed by a list of bullet points. The facts you add should ALWAYS help answering the question.\n\nSTRICTLY reference each fact you use. A fact is preferably referenced by ONLY ONE source e.g [sourceX].\n\nHere is an example on how to reference sources (referenced facts must STRICTLY match the source number):\n- Some information retrieved from source N\u00b0X.[sourceX]\n- Some information retrieved from source N\u00b0Y and some information retrieved from source N\u00b0Z.[sourceY][sourceZ]\n\nMake sure to respect the markdown link format and wrap each link reference between parenthesis.\n\nCurrent date: $current_date.\n",
            "title": "Question Answering System Prompt Template",
            "type": "string"
        },
        "questionAnsweringTriggerPromptTemplate": {
            "default": "sources:\n```\n$content\n```\n\nquestion:\n```\n$query\n```\n\nAnswer concisely in $language (same language as question) and ALWAYS reference each of your facts:",
            "title": "Question Answering Trigger Prompt Template",
            "type": "string"
        },
        "toolSearchSystemPromptTemplate": {
            "default": "You are ChatGPT, a large language model trained by OpenAI.\n\nKnowledge cutoff: $info_cutoff_at.\nCurrent date: $current_date.\n\nYou have access to information in the web, via the `$web_search` tool. If this tool is helpful to answer the question, you can use it to find relevant information.\n",
            "title": "Tool Search System Prompt Template",
            "type": "string"
        },
        "toolSearchTriggerPromptTemplate": {
            "default": "$query",
            "title": "Tool Search Trigger Prompt Template",
            "type": "string"
        },
        "limitTokenSources": {
            "default": 10000,
            "description": "Token Source Limit",
            "title": "Limit Token Sources",
            "type": "integer"
        },
        "numberHistoryInteractionsIncluded": {
            "default": 6,
            "description": "Number of history interactions included",
            "title": "Number History Interactions Included",
            "type": "integer"
        },
        "searchEngineConfig": {
            "description": "Search Engine Configuration",
            "discriminator": {
                "mapping": {
                    "Google": "#/$defs/GoogleConfig"
                },
                "propertyName": "searchEngineName"
            },
            "oneOf": [
                {
                    "$ref": "#/$defs/GoogleConfig"
                }
            ],
            "title": "Search Engine Config"
        },
        "queryGenerationConfig": {
            "$ref": "#/$defs/QueryGenerationConfig",
            "description": "Query Generation Configuration"
        },
        "crawlerConfig": {
            "description": "The crawler configuration",
            "discriminator": {
                "mapping": {
                    "BasicCrawler": "#/$defs/BasicCrawlerConfig",
                    "Crawl4AiCrawler": "#/$defs/Crawl4AiCrawlerConfig",
                    "NoCrawler": "#/$defs/NoCrawlerConfig"
                },
                "propertyName": "crawlerType"
            },
            "oneOf": [
                {
                    "$ref": "#/$defs/Crawl4AiCrawlerConfig"
                },
                {
                    "$ref": "#/$defs/BasicCrawlerConfig"
                },
                {
                    "$ref": "#/$defs/NoCrawlerConfig"
                }
            ],
            "title": "Crawler Config"
        },
        "contentAdapterConfig": {
            "$ref": "#/$defs/ContentAdapterConfig",
            "description": "The content adapter configuration"
        },
        "chunkRelevancySortConfig": {
            "$ref": "#/$defs/ChunkRelevancySortConfig",
            "default": {
                "enabled": false,
                "relevancyLevelsToConsider": [
                    "high",
                    "medium",
                    "low"
                ],
                "relevancyLevelOrder": {
                    "high": 0,
                    "low": 2,
                    "medium": 1
                },
                "languageModel": "AZURE_GPT_35_TURBO_0125",
                "fallbackLanguageModel": "AZURE_GPT_35_TURBO_0125",
                "additionalLlmOptions": {},
                "structuredOutputConfig": {
                    "enabled": false,
                    "extract_fact_list": false,
                    "fact_description": "A fact is an information that is directly answers the user's query. Make sure to emphasize the important information from the fact with bold text.",
                    "fact_list_description": "A list of relevant facts extracted from the source that supports or answers the user's query.",
                    "reason_description": "A brief explanation justifying your evaluation decision.",
                    "value_description": "Assessment of how relevant the facts are to the query. Must be one of: ['low', 'medium', 'high']."
                },
                "maxTasks": null
            },
            "title": "Chunk Relevancy Sort Config"
        }
    },
    "title": "Web Search Config",
    "type": "object"
}

Tool Definition

The tool definition is used when you want to combine multiple modules in the same space.

json

{
  "type": "function",
  "function": {
    "name": "WebSearch",
    "parameters": {
      "type": "object",
      "required": [
        "query"
      ],
      "properties": {
        "query": {
          "type": "string",
          "description": "User's query that needs to be used to retrieve information from the web using a search engine."
        }
      }
    },
    "description": "Useful to answer user's query based on information from the web."
  }
}