Chunk Relevancy Sort

2 min read

info

The service is integrated into the following spaces and modules:

Functionality

The Chunk Relevancy Service is an optional post-processing service that improves the quality of search results by re-ranking text chunks retrieved from semantic vector search and/or combined search. Instead of relying solely on vector similarity and/or full-text search, it evaluates the actual relevance of each chunk to the user’s query using a language model.

Purpose

While vector search and fast-text search are fast and effective, they may overlook subtle context or nuanced details. The Chunk Relevancy Service enhances precision by analyzing each chunk in-depth, ensuring the most contextually relevant information is prioritized in the final response.

How It Works

  1. Initial Chunk Retrieval

    A search query returns a list of top-ranked chunks from the vector search or a combined full-text and vector search.

  2. Per-Chunk Relevance Evaluation

    Each chunk is then passed, alongside the search query, into a dedicated language model call. The model evaluates the chunk’s relevance in context and classifies it as:

    • High

    • Medium

    • Low

  3. Final Re-ranking

    Based on the model-assigned relevance levels, the service reorders the chunks to ensure that the most important content appears first.

Why Use Chunk Relevancy Sorting?

  • Increased Accuracy: Goes beyond token similarity and key-word search by evaluating actual semantic relevance.

  • Detail-Aware: Captures subtle context and phrasing missed by embeddings alone.

Trade-offs and Performance Impact

  • LLM Call per Chunk: Each chunk requires its own LLM call.

    For example, re-ranking the top 100 chunks results in 100 individual LLM calls.

  • Latency: Additional processing time is introduced due to the sequential evaluation of chunks.

  • Cost: LLM usage increases significantly with the number of chunks being evaluated.

Configuration

The ChunkRelevancySortConfig schema defines the settings for sorting data chunks based on relevancy.

Default Configuration

json
{
  "enabled": false,
  "relevancyLevelsToConsider": [
    "high",
    "medium",
    "low"
  ],
  "relevancyLevelOrder": {
    "high": 0,
    "medium": 1,
    "low": 2
  },
  "languageModel": "AZURE_GPT_35_TURBO_0125",
  "fallbackLanguageModel": "AZURE_GPT_35_TURBO_0125",
  "additionalLlmOptions": {},
  "maxTasks": null
}

Fields Documentation

Field Name

Description

Type

Default Value

enabled

Whether to enable the chunk relevancy sort.

boolean

false

relevancyLevelsToConsider

The relevancy levels to consider.

array

["high", "medium", "low"]

relevancyLevelOrder

The relevancy level order.

object

{"high": 0, "medium": 1, "low": 2}

languageModel

The language model to use for the chunk relevancy sort.

LLM Availability Overview

AZURE_GPT_35_TURBO_0125

fallbackLanguageModel

The fallback language model to use for the chunk relevancy sort.

LLM Availability Overview

AZURE_GPT_35_TURBO_0125

additionalLlmOptions

Additional parameters given to the LLM

dict

{}

maxTasks

The maximum number of parallel tasks to use for the chunk relevancy sort.

integer

null

Dependencies

This table describes conditions where fields depend on other fields.

  • Field: The dependent field in the schema.

  • Depends On: The field that influences the condition.

  • Condition: The specific circumstance dictating dependency.

Field

Depends On

Condition

languageModel

fallbackLanguageModel

If languageModel fails, fallbackLanguageModel is used.

Full Json Schema

json
{
  "$defs": {
    "EncoderName": {
      "enum": [
        "o200k_base",
        "cl100k_base"
      ],
      "title": "EncoderName",
      "type": "string"
    },
    "LanguageModelInfo": {
      "properties": {
        "name": {
          "anyOf": [
            {
              "$ref": "#/$defs/LanguageModelName"
            },
            {
              "type": "string"
            }
          ],
          "title": "Name"
        },
        "version": {
          "title": "Version",
          "type": "string"
        },
        "provider": {
          "$ref": "#/$defs/LanguageModelProvider"
        },
        "encoder_name": {
          "$ref": "#/$defs/EncoderName",
          "default": "cl100k_base"
        },
        "token_limits": {
          "$ref": "#/$defs/LanguageModelTokenLimits",
          "default": {
            "token_limit_input": 7000,
            "token_limit_output": 1000
          }
        },
        "capabilities": {
          "default": [
            "streaming"
          ],
          "items": {
            "$ref": "#/$defs/ModelCapabilities"
          },
          "title": "Capabilities",
          "type": "array"
        },
        "info_cutoff_at": {
          "anyOf": [
            {
              "format": "date",
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "title": "Info Cutoff At"
        },
        "published_at": {
          "anyOf": [
            {
              "format": "date",
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "title": "Published At"
        },
        "retirement_at": {
          "anyOf": [
            {
              "format": "date",
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "title": "Retirement At"
        },
        "deprecated_at": {
          "anyOf": [
            {
              "format": "date",
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "title": "Deprecated At"
        },
        "retirement_text": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "title": "Retirement Text"
        }
      },
      "required": [
        "name",
        "version",
        "provider"
      ],
      "title": "LanguageModelInfo",
      "type": "object"
    },
    "LanguageModelName": {
      "enum": [
        "AZURE_GPT_35_TURBO_0125",
        "AZURE_GPT_4_0613",
        "AZURE_GPT_4_32K_0613",
        "AZURE_GPT_4_TURBO_2024_0409",
        "AZURE_GPT_4o_2024_0513",
        "AZURE_GPT_4o_2024_0806",
        "AZURE_GPT_4o_MINI_2024_0718",
        "AZURE_o1_PREVIEW_2024_0912",
        "AZURE_o1_2024_1217",
        "AZURE_o1_MINI_2024_0912",
        "AZURE_o3_MINI_2025_0131",
        "AZURE_GPT_45_PREVIEW_2025_0227"
      ],
      "title": "LanguageModelName",
      "type": "string"
    },
    "LanguageModelProvider": {
      "enum": [
        "AZURE",
        "CUSTOM"
      ],
      "title": "LanguageModelProvider",
      "type": "string"
    },
    "LanguageModelTokenLimits": {
      "properties": {
        "token_limit_input": {
          "title": "Token Limit Input",
          "type": "integer"
        },
        "token_limit_output": {
          "title": "Token Limit Output",
          "type": "integer"
        }
      },
      "required": [
        "token_limit_input",
        "token_limit_output"
      ],
      "title": "LanguageModelTokenLimits",
      "type": "object"
    },
    "ModelCapabilities": {
      "enum": [
        "function_calling",
        "parallel_function_calling",
        "reproducible_output",
        "structured_output",
        "vision",
        "streaming",
        "reasoning"
      ],
      "title": "ModelCapabilities",
      "type": "string"
    }
  },
  "properties": {
    "enabled": {
      "default": false,
      "description": "Whether to enable the chunk relevancy sort.",
      "title": "Enabled",
      "type": "boolean"
    },
    "relevancyLevelsToConsider": {
      "default": [
        "high",
        "medium",
        "low"
      ],
      "description": "The relevancy levels to consider.",
      "items": {
        "type": "string"
      },
      "title": "Relevancylevelstoconsider",
      "type": "array"
    },
    "relevancyLevelOrder": {
      "additionalProperties": {
        "type": "integer"
      },
      "default": {
        "high": 0,
        "medium": 1,
        "low": 2
      },
      "description": "The relevancy level order.",
      "title": "Relevancylevelorder",
      "type": "object"
    },
    "languageModel": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "$ref": "#/$defs/LanguageModelName"
        },
        {
          "$ref": "#/$defs/LanguageModelInfo"
        }
      ],
      "default": {
        "name": "AZURE_GPT_35_TURBO_0125",
        "version": "0125",
        "provider": "AZURE",
        "encoder_name": "cl100k_base",
        "token_limits": {
          "token_limit_input": 16385,
          "token_limit_output": 4096
        },
        "capabilities": [
          "structured_output",
          "function_calling",
          "parallel_function_calling",
          "reproducible_output"
        ],
        "info_cutoff_at": "2021-09-01",
        "published_at": "2023-01-25",
        "retirement_at": "0005-03-31",
        "deprecated_at": null,
        "retirement_text": null
      },
      "description": "The language model to use for the chunk relevancy sort."
    },
    "fallbackLanguageModel": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "$ref": "#/$defs/LanguageModelName"
        },
        {
          "$ref": "#/$defs/LanguageModelInfo"
        }
      ],
      "default": {
        "name": "AZURE_GPT_35_TURBO_0125",
        "version": "0125",
        "provider": "AZURE",
        "encoder_name": "cl100k_base",
        "token_limits": {
          "token_limit_input": 16385,
          "token_limit_output": 4096
        },
        "capabilities": [
          "structured_output",
          "function_calling",
          "parallel_function_calling",
          "reproducible_output"
        ],
        "info_cutoff_at": "2021-09-01",
        "published_at": "2023-01-25",
        "retirement_at": "0005-03-31",
        "deprecated_at": null,
        "retirement_text": null
      },
      "description": "The fallback language model to use for the chunk relevancy sort."
    },
    "maxTasks": {
      "default": null,
      "description": "The maximum number of tasks to use for the chunk relevancy sort.",
      "title": "Maxtasks",
      "type": "integer"
    }
  },
  "title": "ChunkRelevancySortConfig",
  "type": "object"
}
Last updated