Chunk Relevancy Sort

2 min read

The service is integrated into the following spaces and modules:

Functionality

The Chunk Relevancy Service is an optional post-processing service that improves the quality of search results by re-ranking text chunks retrieved from semantic vector search and/or combined search. Instead of relying solely on vector similarity and/or full-text search, it evaluates the actual relevance of each chunk to the user’s query using a language model.

Purpose

While vector search and fast-text search are fast and effective, they may overlook subtle context or nuanced details. The Chunk Relevancy Service enhances precision by analyzing each chunk in-depth, ensuring the most contextually relevant information is prioritized in the final response.

How It Works

Initial Chunk Retrieval
A search query returns a list of top-ranked chunks from the vector search or a combined full-text and vector search.
Per-Chunk Relevance Evaluation
Each chunk is then passed, alongside the search query, into a dedicated language model call. The model evaluates the chunk’s relevance in context and classifies it as:
- High
- Medium
- Low
Final Re-ranking
Based on the model-assigned relevance levels, the service reorders the chunks to ensure that the most important content appears first.

Why Use Chunk Relevancy Sorting?

✅ Increased Accuracy: Goes beyond token similarity and key-word search by evaluating actual semantic relevance.
✅ Detail-Aware: Captures subtle context and phrasing missed by embeddings alone.

Trade-offs and Performance Impact

LLM Call per Chunk: Each chunk requires its own LLM call.
For example, re-ranking the top 100 chunks results in 100 individual LLM calls.
Latency: Additional processing time is introduced due to the sequential evaluation of chunks.
Cost: LLM usage increases significantly with the number of chunks being evaluated.

Configuration

The ChunkRelevancySortConfig schema defines the settings for sorting data chunks based on relevancy.

Default Configuration

json

{
  "enabled": false,
  "relevancyLevelsToConsider": [
    "high",
    "medium",
    "low"
  ],
  "relevancyLevelOrder": {
    "high": 0,
    "medium": 1,
    "low": 2
  },
  "languageModel": "AZURE_GPT_35_TURBO_0125",
  "fallbackLanguageModel": "AZURE_GPT_35_TURBO_0125",
  "additionalLlmOptions": {},
  "maxTasks": null
}

Fields Documentation

Field Name	Description	Type	Default Value
`enabled`	Whether to enable the chunk relevancy sort.	boolean	`false`
`relevancyLevelsToConsider`	The relevancy levels to consider.	array	`["high", "medium", "low"]`
`relevancyLevelOrder`	The relevancy level order.	object	`{"high": 0, "medium": 1, "low": 2}`
`languageModel`	The language model to use for the chunk relevancy sort.	LLM Availability Overview	`AZURE_GPT_35_TURBO_0125`
`fallbackLanguageModel`	The fallback language model to use for the chunk relevancy sort.	LLM Availability Overview	`AZURE_GPT_35_TURBO_0125`
`additionalLlmOptions`	Additional parameters given to the LLM	dict	{}
`maxTasks`	The maximum number of parallel tasks to use for the chunk relevancy sort.	integer	`null`

Dependencies

This table describes conditions where fields depend on other fields.

Field: The dependent field in the schema.
Depends On: The field that influences the condition.
Condition: The specific circumstance dictating dependency.

Field	Depends On	Condition
languageModel	fallbackLanguageModel	If languageModel fails, fallbackLanguageModel is used.

Full Json Schema

json

{
  "$defs": {
    "EncoderName": {
      "enum": [
        "o200k_base",
        "cl100k_base"
      ],
      "title": "EncoderName",
      "type": "string"
    },
    "LanguageModelInfo": {
      "properties": {
        "name": {
          "anyOf": [
            {
              "$ref": "#/$defs/LanguageModelName"
            },
            {
              "type": "string"
            }
          ],
          "title": "Name"
        },
        "version": {
          "title": "Version",
          "type": "string"
        },
        "provider": {
          "$ref": "#/$defs/LanguageModelProvider"
        },
        "encoder_name": {
          "$ref": "#/$defs/EncoderName",
          "default": "cl100k_base"
        },
        "token_limits": {
          "$ref": "#/$defs/LanguageModelTokenLimits",
          "default": {
            "token_limit_input": 7000,
            "token_limit_output": 1000
          }
        },
        "capabilities": {
          "default": [
            "streaming"
          ],
          "items": {
            "$ref": "#/$defs/ModelCapabilities"
          },
          "title": "Capabilities",
          "type": "array"
        },
        "info_cutoff_at": {
          "anyOf": [
            {
              "format": "date",
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "title": "Info Cutoff At"
        },
        "published_at": {
          "anyOf": [
            {
              "format": "date",
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "title": "Published At"
        },
        "retirement_at": {
          "anyOf": [
            {
              "format": "date",
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "title": "Retirement At"
        },
        "deprecated_at": {
          "anyOf": [
            {
              "format": "date",
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "title": "Deprecated At"
        },
        "retirement_text": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "title": "Retirement Text"
        }
      },
      "required": [
        "name",
        "version",
        "provider"
      ],
      "title": "LanguageModelInfo",
      "type": "object"
    },
    "LanguageModelName": {
      "enum": [
        "AZURE_GPT_35_TURBO_0125",
        "AZURE_GPT_4_0613",
        "AZURE_GPT_4_32K_0613",
        "AZURE_GPT_4_TURBO_2024_0409",
        "AZURE_GPT_4o_2024_0513",
        "AZURE_GPT_4o_2024_0806",
        "AZURE_GPT_4o_MINI_2024_0718",
        "AZURE_o1_PREVIEW_2024_0912",
        "AZURE_o1_2024_1217",
        "AZURE_o1_MINI_2024_0912",
        "AZURE_o3_MINI_2025_0131",
        "AZURE_GPT_45_PREVIEW_2025_0227"
      ],
      "title": "LanguageModelName",
      "type": "string"
    },
    "LanguageModelProvider": {
      "enum": [
        "AZURE",
        "CUSTOM"
      ],
      "title": "LanguageModelProvider",
      "type": "string"
    },
    "LanguageModelTokenLimits": {
      "properties": {
        "token_limit_input": {
          "title": "Token Limit Input",
          "type": "integer"
        },
        "token_limit_output": {
          "title": "Token Limit Output",
          "type": "integer"
        }
      },
      "required": [
        "token_limit_input",
        "token_limit_output"
      ],
      "title": "LanguageModelTokenLimits",
      "type": "object"
    },
    "ModelCapabilities": {
      "enum": [
        "function_calling",
        "parallel_function_calling",
        "reproducible_output",
        "structured_output",
        "vision",
        "streaming",
        "reasoning"
      ],
      "title": "ModelCapabilities",
      "type": "string"
    }
  },
  "properties": {
    "enabled": {
      "default": false,
      "description": "Whether to enable the chunk relevancy sort.",
      "title": "Enabled",
      "type": "boolean"
    },
    "relevancyLevelsToConsider": {
      "default": [
        "high",
        "medium",
        "low"
      ],
      "description": "The relevancy levels to consider.",
      "items": {
        "type": "string"
      },
      "title": "Relevancylevelstoconsider",
      "type": "array"
    },
    "relevancyLevelOrder": {
      "additionalProperties": {
        "type": "integer"
      },
      "default": {
        "high": 0,
        "medium": 1,
        "low": 2
      },
      "description": "The relevancy level order.",
      "title": "Relevancylevelorder",
      "type": "object"
    },
    "languageModel": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "$ref": "#/$defs/LanguageModelName"
        },
        {
          "$ref": "#/$defs/LanguageModelInfo"
        }
      ],
      "default": {
        "name": "AZURE_GPT_35_TURBO_0125",
        "version": "0125",
        "provider": "AZURE",
        "encoder_name": "cl100k_base",
        "token_limits": {
          "token_limit_input": 16385,
          "token_limit_output": 4096
        },
        "capabilities": [
          "structured_output",
          "function_calling",
          "parallel_function_calling",
          "reproducible_output"
        ],
        "info_cutoff_at": "2021-09-01",
        "published_at": "2023-01-25",
        "retirement_at": "0005-03-31",
        "deprecated_at": null,
        "retirement_text": null
      },
      "description": "The language model to use for the chunk relevancy sort."
    },
    "fallbackLanguageModel": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "$ref": "#/$defs/LanguageModelName"
        },
        {
          "$ref": "#/$defs/LanguageModelInfo"
        }
      ],
      "default": {
        "name": "AZURE_GPT_35_TURBO_0125",
        "version": "0125",
        "provider": "AZURE",
        "encoder_name": "cl100k_base",
        "token_limits": {
          "token_limit_input": 16385,
          "token_limit_output": 4096
        },
        "capabilities": [
          "structured_output",
          "function_calling",
          "parallel_function_calling",
          "reproducible_output"
        ],
        "info_cutoff_at": "2021-09-01",
        "published_at": "2023-01-25",
        "retirement_at": "0005-03-31",
        "deprecated_at": null,
        "retirement_text": null
      },
      "description": "The fallback language model to use for the chunk relevancy sort."
    },
    "maxTasks": {
      "default": null,
      "description": "The maximum number of tasks to use for the chunk relevancy sort.",
      "title": "Maxtasks",
      "type": "integer"
    }
  },
  "title": "ChunkRelevancySortConfig",
  "type": "object"
}