Internal Search

4 min read

Functionality

This tool retrieves information from data stored in the knowledge center. It performs a semantic search (using VectorDB) and/or a full-text search (using PostgreSQL), based on the search query generated by the agent. Optionally, the results can be re-ranked using chunk relevance scoring to ensure the most relevant content are returned by the tool to the agent.

Configuration

The fields in this schema define the essential attributes and configurations required for conducting searches within the internal knowledge base. Each field contributes to the overall functionality by specifying search parameters, sorting mechanisms, and tool descriptions, ensuring a comprehensive and efficient search experience.

Default Configuration

json

{
        "chatOnly": false,
        "chunkRelevancySortConfig": {
          "additionalLlmOptions": {},
          "enabled": false,
          "fallbackLanguageModel": "AZURE_GPT_4o_2024_1120",
          "languageModel": "AZURE_GPT_4o_2024_1120",
          "maxTasks": 1000,
          "relevancyLevelOrder": {
            "high": 0,
            "low": 2,
            "medium": 1
          },
          "relevancyLevelsToConsider": [
            "high",
            "medium",
            "low"
          ],
          "structuredOutputConfig": {
            "enabled": false,
            "extractFactList": false,
            "factDescription": "A fact is an information that is directly answers the user's query. Make sure to emphasize the important information from the fact with bold text.",
            "factListDescription": "A list of relevant facts extracted from the source that supports or answers the user's query.",
            "reasonDescription": "A brief explanation justifying your evaluation decision.",
            "valueDescription": "Assessment of how relevant the facts are to the query. Must be one of: ['low', 'medium', 'high']."
          }
        },
        "chunkedSources": true,
        "enableMultipleSearchStringsExecution": true,
        "evaluationCheckList": [
          "hallucination"
        ],
        "limit": 1000,
        "maxSearchStrings": 10,
        "metadataChunkSections": {},
        "paramDescriptionLanguage": "The language that the user wrote the query in",
        "paramDescriptionSearchString": "An expanded term that is optimized for vector and full text search based on the users query it must be in english.",
        "percentageOfInputTokensForSources": 0.4,
        "rerankerConfig": null,
        "scopeIds": null,
        "scopeToChatOnUpload": false,
        "scoreThreshold": 0,
        "searchLanguage": "english",
        "searchType": "COMBINED",
        "toolDescription": "Search in the company knowledge base for information on policies, procedures, benefits, groups, financial information or specific people. This should be your go-to tool if no other tools are applicable.",
        "toolDescriptionForSystemPrompt": "You can use the InternalSearch tool to access internal company documentations, including information on policies, procedures, benefits, groups, financial details, and specific individuals. If this tool can help answer your question, feel free to use it to search the internal knowledge base for more information. If possible always try to get information from the internal knowledge base with the InternalSearch tool before using other tools.\nUse cases for the Internal Knowledge Search are:\n- User asks to work with a document: Most likely the document is uploaded to the chat and mentioned in a message and can be loaded with this tool\n- Policy and Procedure Verification: Use the internal search tool to find the most current company policies, procedures, or guidelines to ensure compliance and accuracy in responses.\n- Project-Specific Information: When answering questions related to ongoing projects or initiatives, use the internal search to access project documents, reports, or meeting notes for precise details.\n- Employee Directory and Contact Information: Utilize the internal search to locate contact details or organizational charts to facilitate communication and collaboration within the company.\n- Confidential and Proprietary Information: When dealing with sensitive topics that require proprietary knowledge or confidential data, use the internal search to ensure the information is sourced from secure and authorized company documents.\n\n**Instruction Query Splitting**\nYou should split the user question into multiple search strings when the user's question needs to be decomposed / rewritten to find different facts. Perform for each search string an individual tool call. Avoid short queries that are extremely broad and will return unrelated results. Strip the search string of any extraneous details, e.g. instructions or unnecessary context. However, you must fill in relevant context from the rest of the conversation to make the question complete. E.g. \"What was their age?\" => \"What was Kevin's age?\" because the preceding conversation makes it clear that the user is talking about Kevin.\n\nHere are some examples of how to use the InternalSearch tool:\nUser: What was the GDP of France and Italy in the 1970s? => search strings: [\"france gdp 1970\", \"italy gdp 1970\"] # Splitting of the query into 2 queries and perform 2 tool calls\nUser: What does the report say about the GPT4 performance on MMLU? => search strings: [\"GPT4 performance on MMLU?\"] # Simplify the query",
        "toolFormatInformationForSystemPrompt": "Whenever you use information retrieved with the InternalSearch, you must adhere to strict reference guidelines. You must strictly reference each fact used with the `source_number` of the corresponding passage, in the following format: '[source<source_number>]'.\n\nExample:\n- The stock price of Apple Inc. is $150 [source0] and the company's revenue increased by 10% [source1].\n- Moreover, the company's market capitalization is $2 trillion [source2][source3].\n- Our internal documents tell us to invest[source4] (Internal)\n\nA fact is preferably referenced by ONLY ONE source, e.g [sourceX], which should be the most relevant source for the fact.\nFollow these guidelines closely and be sure to use the proper `source_number` when referencing facts.\nMake sure that your reference follow the format [sourceX] and that the source number is correct.\nSource is written in singular form and the number is written in digits.\n\nIT IS VERY IMPORTANT TO FOLLOW THESE GUIDELINES!!\nNEVER CITE A source_number THAT YOU DON'T SEE IN THE TOOL CALL RESPONSE!!!\nThe source_number in old assistant messages are no longer valid.\nEXAMPLE: If you see [source34] and [source35] in the assistant message, you can't use [source34] again in the next assistant message, this has to be the number you find in the message with role 'tool'.\nBE AWARE:All tool calls have been filtered to remove uncited sources. Tool calls return much more data than you see\n\n### Internal Document Answering Protocol for Employee Questions\nWhen assisting employees using internal documents, follow\nthis structured approach to ensure precise, well-grounded,\nand context-aware responses:\n\n#### 1. Locate and Prioritize Relevant Internal Sources\nGive strong preference to:\n- **Most relevant documents**, such as:\n- **Documents authored by or involving** the employee or team in question\n- **Cross-validated sources**, especially when multiple documents agree\n  - Project trackers, design docs, decision logs, and OKRs\n  - Recently updated or active files\n\n#### 2. Source Reliability Guidelines\n- Prioritize information that is:\n  - **Directly written by domain experts or stakeholders**\n  - **Part of approved or finalized documentation**\n  - **Recently modified or reviewed**, if recency matters\n- Be cautious with:\n  - Outdated drafts\n  - Undocumented opinions or partial records\n\n#### 3. Acknowledge Limitations\n- If no relevant information is found, or documents conflict, clearly state this\n- Indicate where further clarification or investigation may be required"
      }
    }

Field Name	Description	Type	Default Value
`chat_only`	Whether to only chat on the upload.	boolean	`false`
`chunk_relevancy_sort_config`	The chunk relevancy sort config to use for the search: Chunk Relevancy Sort	object	See above
`chunked_sources`	Whether to chunk the sources.	boolean	`true`
`enable_multiple_search_strings_execution`	Whether multiple search queries are generated within a single tool call to improve search quality	boolean	`true`
`evaluation_check_list`	The list of evaluation metrics to check.	array	`["hallucination"]`
`limit`	The limit of chunks to return.	integer	`50`
`max_search_strings`	The maximum number of search strings to perform in a single tool call.	integer	`10`
`metadata_chunk_sections`	Dictionary of metadata that should be appended to the search chunks. Each metadata field is specified as a key (e.g., author, source, timestamp). The value is a template string defining how the metadata is formatted and embedded in the chunk text. For example: `{ "source": "<\|source\|>{}<\|/source\|>", "created_at": "<\|created_at\|>{}<\|/created_at\|>" }`	dict	`{}`
`param_description_language`	`language` parameter description.	string	See default above
`param_description_search_string`	`search_string` parameter description.	string	See default above
`percentage_of_input_tokens_for_sources`	The percentage of the maximum input tokens of the language model to use for the tool response.	float (0-1)	`0.4`
`reranker_config`	The reranker config to use for the search. Reranker	object	See above
`scope_ids`	The scope ids to use for the search.	array	`null`
`scope_to_chat_on_upload`	Whether to scope the search to the chat on upload.	boolean	`false`
`score_threshold`	The threshold for the similarity cutoff on search results. Can be a value between 0 and 1 where 0 includes all results	float	`0`
`search_language`	The language to use for the search.	string	`english`
`search_type`	The type of search to perform. Two possible values: `COMBINED` or `VECTOR`.	string	`COMBINED`
`tool_description`	Tool description.	string	See default above
`tool_description_for_system_prompt`	Tool description for the system prompt.	string	See default above
`tool_format_information_for_system_prompt`	Tool format information for the system prompt.	string	See default above