Chunk Relevancy Sort
2 min read
The service is integrated into the following spaces and modules:
Functionality
The Chunk Relevancy Service is an optional post-processing service that improves the quality of search results by re-ranking text chunks retrieved from semantic vector search and/or combined search. Instead of relying solely on vector similarity and/or full-text search, it evaluates the actual relevance of each chunk to the user’s query using a language model.
Purpose
While vector search and fast-text search are fast and effective, they may overlook subtle context or nuanced details. The Chunk Relevancy Service enhances precision by analyzing each chunk in-depth, ensuring the most contextually relevant information is prioritized in the final response.
How It Works
Initial Chunk Retrieval
A search query returns a list of top-ranked chunks from the vector search or a combined full-text and vector search.
Per-Chunk Relevance Evaluation
Each chunk is then passed, alongside the search query, into a dedicated language model call. The model evaluates the chunk’s relevance in context and classifies it as:
High
Medium
Low
Final Re-ranking
Based on the model-assigned relevance levels, the service reorders the chunks to ensure that the most important content appears first.
Why Use Chunk Relevancy Sorting?
✅ Increased Accuracy: Goes beyond token similarity and key-word search by evaluating actual semantic relevance.
✅ Detail-Aware: Captures subtle context and phrasing missed by embeddings alone.
Trade-offs and Performance Impact
LLM Call per Chunk: Each chunk requires its own LLM call.
For example, re-ranking the top 100 chunks results in 100 individual LLM calls.
Latency: Additional processing time is introduced due to the sequential evaluation of chunks.
Cost: LLM usage increases significantly with the number of chunks being evaluated.
Configuration
The ChunkRelevancySortConfig schema defines the settings for sorting data chunks based on relevancy.
Default Configuration
{
"enabled": false,
"relevancyLevelsToConsider": [
"high",
"medium",
"low"
],
"relevancyLevelOrder": {
"high": 0,
"medium": 1,
"low": 2
},
"languageModel": "AZURE_GPT_35_TURBO_0125",
"fallbackLanguageModel": "AZURE_GPT_35_TURBO_0125",
"additionalLlmOptions": {},
"maxTasks": null
}Fields Documentation
Field Name | Description | Type | Default Value |
|---|---|---|---|
| Whether to enable the chunk relevancy sort. | boolean |
|
| The relevancy levels to consider. | array |
|
| The relevancy level order. | object |
|
| The language model to use for the chunk relevancy sort. |
| |
| The fallback language model to use for the chunk relevancy sort. |
| |
| Additional parameters given to the LLM | dict | {} |
| The maximum number of parallel tasks to use for the chunk relevancy sort. | integer |
|
Dependencies
This table describes conditions where fields depend on other fields.
Field: The dependent field in the schema.
Depends On: The field that influences the condition.
Condition: The specific circumstance dictating dependency.
Field | Depends On | Condition |
|---|---|---|
languageModel | fallbackLanguageModel | If languageModel fails, fallbackLanguageModel is used. |
Full Json Schema
{
"$defs": {
"EncoderName": {
"enum": [
"o200k_base",
"cl100k_base"
],
"title": "EncoderName",
"type": "string"
},
"LanguageModelInfo": {
"properties": {
"name": {
"anyOf": [
{
"$ref": "#/$defs/LanguageModelName"
},
{
"type": "string"
}
],
"title": "Name"
},
"version": {
"title": "Version",
"type": "string"
},
"provider": {
"$ref": "#/$defs/LanguageModelProvider"
},
"encoder_name": {
"$ref": "#/$defs/EncoderName",
"default": "cl100k_base"
},
"token_limits": {
"$ref": "#/$defs/LanguageModelTokenLimits",
"default": {
"token_limit_input": 7000,
"token_limit_output": 1000
}
},
"capabilities": {
"default": [
"streaming"
],
"items": {
"$ref": "#/$defs/ModelCapabilities"
},
"title": "Capabilities",
"type": "array"
},
"info_cutoff_at": {
"anyOf": [
{
"format": "date",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Info Cutoff At"
},
"published_at": {
"anyOf": [
{
"format": "date",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Published At"
},
"retirement_at": {
"anyOf": [
{
"format": "date",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Retirement At"
},
"deprecated_at": {
"anyOf": [
{
"format": "date",
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Deprecated At"
},
"retirement_text": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Retirement Text"
}
},
"required": [
"name",
"version",
"provider"
],
"title": "LanguageModelInfo",
"type": "object"
},
"LanguageModelName": {
"enum": [
"AZURE_GPT_35_TURBO_0125",
"AZURE_GPT_4_0613",
"AZURE_GPT_4_32K_0613",
"AZURE_GPT_4_TURBO_2024_0409",
"AZURE_GPT_4o_2024_0513",
"AZURE_GPT_4o_2024_0806",
"AZURE_GPT_4o_MINI_2024_0718",
"AZURE_o1_PREVIEW_2024_0912",
"AZURE_o1_2024_1217",
"AZURE_o1_MINI_2024_0912",
"AZURE_o3_MINI_2025_0131",
"AZURE_GPT_45_PREVIEW_2025_0227"
],
"title": "LanguageModelName",
"type": "string"
},
"LanguageModelProvider": {
"enum": [
"AZURE",
"CUSTOM"
],
"title": "LanguageModelProvider",
"type": "string"
},
"LanguageModelTokenLimits": {
"properties": {
"token_limit_input": {
"title": "Token Limit Input",
"type": "integer"
},
"token_limit_output": {
"title": "Token Limit Output",
"type": "integer"
}
},
"required": [
"token_limit_input",
"token_limit_output"
],
"title": "LanguageModelTokenLimits",
"type": "object"
},
"ModelCapabilities": {
"enum": [
"function_calling",
"parallel_function_calling",
"reproducible_output",
"structured_output",
"vision",
"streaming",
"reasoning"
],
"title": "ModelCapabilities",
"type": "string"
}
},
"properties": {
"enabled": {
"default": false,
"description": "Whether to enable the chunk relevancy sort.",
"title": "Enabled",
"type": "boolean"
},
"relevancyLevelsToConsider": {
"default": [
"high",
"medium",
"low"
],
"description": "The relevancy levels to consider.",
"items": {
"type": "string"
},
"title": "Relevancylevelstoconsider",
"type": "array"
},
"relevancyLevelOrder": {
"additionalProperties": {
"type": "integer"
},
"default": {
"high": 0,
"medium": 1,
"low": 2
},
"description": "The relevancy level order.",
"title": "Relevancylevelorder",
"type": "object"
},
"languageModel": {
"anyOf": [
{
"type": "string"
},
{
"$ref": "#/$defs/LanguageModelName"
},
{
"$ref": "#/$defs/LanguageModelInfo"
}
],
"default": {
"name": "AZURE_GPT_35_TURBO_0125",
"version": "0125",
"provider": "AZURE",
"encoder_name": "cl100k_base",
"token_limits": {
"token_limit_input": 16385,
"token_limit_output": 4096
},
"capabilities": [
"structured_output",
"function_calling",
"parallel_function_calling",
"reproducible_output"
],
"info_cutoff_at": "2021-09-01",
"published_at": "2023-01-25",
"retirement_at": "0005-03-31",
"deprecated_at": null,
"retirement_text": null
},
"description": "The language model to use for the chunk relevancy sort."
},
"fallbackLanguageModel": {
"anyOf": [
{
"type": "string"
},
{
"$ref": "#/$defs/LanguageModelName"
},
{
"$ref": "#/$defs/LanguageModelInfo"
}
],
"default": {
"name": "AZURE_GPT_35_TURBO_0125",
"version": "0125",
"provider": "AZURE",
"encoder_name": "cl100k_base",
"token_limits": {
"token_limit_input": 16385,
"token_limit_output": 4096
},
"capabilities": [
"structured_output",
"function_calling",
"parallel_function_calling",
"reproducible_output"
],
"info_cutoff_at": "2021-09-01",
"published_at": "2023-01-25",
"retirement_at": "0005-03-31",
"deprecated_at": null,
"retirement_text": null
},
"description": "The fallback language model to use for the chunk relevancy sort."
},
"maxTasks": {
"default": null,
"description": "The maximum number of tasks to use for the chunk relevancy sort.",
"title": "Maxtasks",
"type": "integer"
}
},
"title": "ChunkRelevancySortConfig",
"type": "object"
}