Additional ingestion configuration options
6 min read
There are other ingestion configuration options available.
Ingestion configuration
Following here is the complete ingestionConfig (first values = default). Those values can be adjusted as described in the sub chapters and set on different levels for documents.
{
ingestionConfig: {
// Core Chunking Settings
chunkMaxTokens: 600,
chunkMaxTokensOnePager: 1000,
chunkMinTokens: 3,
documentMinTokens: 25,
// Chunking Strategy
chunkStrategy: 'RECURSIVE_CHUNKING' | 'UNIQUE_DEFAULT_CHUNKING' | 'CUSTOM_CHUNKING_API' | 'CONTEXTUAL_CHUNKING' | 'CONTEXTUAL_CHUNKING_LIGHT',
chunkingConfiguration: {
systemPrompt: 'string',
model: 'string',
tokens: number
},
// Document Processing Modes
pdfReadMode: 'DOC_INTELLIGENCE_DISABLED' | 'DOC_INTELLIGENCE_DEFAULT' | 'DOC_INTELLIGENCE_ON_TABLE' | 'DOC_INTELLIGENCE_FALLBACK' | 'PDFTODOCX_ONLY' | 'CUSTOM_SINGLE_PAGE_API',
wordReadMode: 'MAMMOTH_ONLY' | 'DOC_INTELLIGENCE_DEFAULT' | 'CUSTOM_SINGLE_PAGE_API' | 'INGEST_WORD_AS_PDF',
pptReadMode: 'INGEST_WITH_DEFAULT_SERVICE' | 'INGEST_PPT_AS_PDF',
excelReadMode: 'INGEST_WITH_DEFAULT_SERVICE' | 'INGEST_EXCEL_AS_PDF',
jpgReadMode: 'NO_INGESTION' | 'DOC_INTELLIGENCE_DEFAULT',
// Ingestion Mode
uniqueIngestionMode: 'INGESTION' | 'SKIP_INGESTION' | 'SKIP_EXCEL_INGESTION' | 'EXTERNAL_INGESTION',
// Custom API Options
customApiOptions: [] | Array<{
customisationType: 'CUSTOM_SINGLE_PAGE_API' | 'CUSTOM_CHUNKING_API',
apiIdentifier: 'YOUR IDENTIFIER',
apiPayload?: '{"xxx": "yyyy"}'
}>,
// Format-Specific Configuration
pdfConfig: {
usePageBasedChunking: false
},
pptConfig: {
usePageBasedChunking: false
},
excelConfig: {
rowsPerChunk: number,
tableFormat: 'MARKDOWN' | 'OBJECT',
headerRows: [1],
headerColumns: [],
maxEmptyTableRows: 1,
maxEmptyTableCols: 2,
tableChunkTokenLimit: 2000,
maxRows: 5000,
maxCols: 100
},
csvConfig: {
maxRows: 5000,
maxCols: 100
},
vttConfig: {
languageModel: 'string'
},
// Metadata Configuration
metadata: {},
shouldApplyToSubScopes: false,
hideInChat: false,
// Metadata Extraction (AI-powered)
metadataExtractionConfig: {
enabled: false,
metadataSchema: {},
languageModel: 'string',
maxInputTokens: number
}
}
}The ingestion configuration can be set on different levels:
On the content object on file upload directly: Knowledge Base - Ingestion API
On instance level for all companies on a tenant. Contact Unique Customer Success.
On the Space (Assistant) level via Advanced Settings - applies to all documents uploaded in chat for that space
On the Scope/Folder level via API or Admin UI
Configuring Ingestion in Space (Assistant) Settings
You can configure ingestion settings at the Space level to control how documents uploaded to chat are processed. This is done through the Advanced Settings section of Space Management.
Via Admin UI
Navigate to Admin > Spaces
Select the space to configure
Click Advanced Settings
In the JSON configuration, add or modify the
ingestionConfigobjectSave the configuration
Via the Assistant Configuration JSON
The ingestionConfig can be set within the assistant's settings JSON:
{
"ingestionConfig": {
"pdfReadMode": "DOC_INTELLIGENCE_DEFAULT",
"wordReadMode": "INGEST_WORD_AS_PDF",
"chunkMaxTokens": 600,
"chunkStrategy": "RECURSIVE_CHUNKING"
}
}Example: Enable MDI for Upload in Chat
To use Microsoft Document Intelligence processing when uploading documents to a specific space's chat:
{
"ingestionConfig": {
"pdfReadMode": "DOC_INTELLIGENCE_DEFAULT",
"wordReadMode": "INGEST_WORD_AS_PDF"
}
}Example: Custom Single Page API for Chat Uploads
To use a custom ingestion API (like Agentic Ingestion) for documents uploaded in chat:
{
"ingestionConfig": {
"pdfReadMode": "CUSTOM_SINGLE_PAGE_API",
"customApiOptions": [{
"customisationType": "CUSTOM_SINGLE_PAGE_API",
"apiIdentifier": "Unique Text and Image Extraction API",
"apiPayload": "{}"
}]
}
}Setting the general Unique AI ingestion mode
This mode defines the overall behaviour of the ingestion. There are four possible options:
Value | Description |
|---|---|
| Default. Content is queued to be ingested by Unique. |
| Directly sets status to |
| Process all documents except Excel and CSV files (which are stored but not indexed). |
| Ingestion handled by SDK integration. Status set to |
Chunk Strategy Options
Value | Description |
|---|---|
| Default strategy. Recursively splits text at natural boundaries (paragraphs, sentences, words). |
| Legacy strategy, now maps to |
| Uses an external custom API for chunking. Requires |
| Advanced strategy that generates per-chunk summaries using an LLM to improve retrieval. |
| Lighter version that generates a single document summary prepended to all chunks. |
Chunking Configuration (for Contextual Chunking)
Field | Type | Description |
|---|---|---|
| string | Custom system prompt for the summarization LLM. |
| string | The LLM model to use for summarization. |
| number | Token limit for generated summaries. |
PDF Read Mode Options
Value | Description |
|---|---|
| Default. Do not use Document Intelligence. Uses standard PDF parsing only. |
| Always use Azure Document Intelligence for PDF processing. Best for complex layouts and tables. |
| Use Document Intelligence only when tables are detected. |
| Use Document Intelligence as a fallback when standard parsing fails. |
| Convert PDF to DOCX format first, then process. |
| Use a custom external API for page-by-page processing. Requires |
Word Read Mode Options
Value | Description |
|---|---|
| Default. Use the Mammoth library for Word to HTML conversion. |
| Use Azure Document Intelligence for Word processing. |
| Use a custom external API for processing. |
| Convert Word to PDF first, then process using PDF pipeline. |
PowerPoint Read Mode Options
Value | Description |
|---|---|
| Default. Use the default PowerPoint processing service. |
| Convert PowerPoint to PDF first, then process using PDF pipeline. |
Excel Read Mode Options
Value | Description |
|---|---|
| Default. Use the default Excel processing service with table extraction. |
| Convert Excel to PDF first, then process using PDF pipeline. |
Image/JPG Read Mode Options
Value | Description |
|---|---|
| Default. Skip image ingestion entirely. |
| Use Azure Document Intelligence OCR for text extraction from images. |
Core Chunking Parameters
Field | Type | Default | Description |
|---|---|---|---|
| number | 600 | Maximum number of tokens per chunk. Azure OpenAI supports up to 2048, but 600 is recommended for optimal retrieval. |
| number | 3 | Minimum tokens required for a chunk. Chunks below this are merged with adjacent chunks. |
| number | 1000 | Maximum tokens for "one-pager" documents that should not be split. |
| number | 25 | Minimum tokens required for a document to be ingested. Documents below this are skipped. |
Format-Specific Configuration
PDF Configuration
Field | Type | Default | Description |
|---|---|---|---|
| boolean | false | If true, creates separate chunks for each page rather than merging across pages. |
PowerPoint Configuration
Field | Type | Default | Description |
|---|---|---|---|
| boolean | false | If true, creates separate chunks for each slide. |
Excel Configuration
Field | Type | Default | Description |
|---|---|---|---|
| number | - | Number of rows per chunk. If not set, uses token-based chunking. |
| enum |
| Output format for tables ( |
| number[] | [1] | Array of row indices (1-based) to treat as header rows. |
| number[] | [] | Array of column indices (1-based) to treat as header columns. |
| number | 1 | Maximum consecutive empty rows allowed before table is split. |
| number | 2 | Maximum consecutive empty columns allowed. |
| number | 2000 | Maximum tokens per table chunk. |
| number | 5000 | Maximum rows allowed. Ingestion fails if exceeded. |
| number | 100 | Maximum columns allowed. Ingestion fails if exceeded. |
CSV Configuration
Field | Type | Default | Description |
|---|---|---|---|
| number | 5000 | Maximum rows allowed in CSV files. |
| number | 100 | Maximum columns allowed in CSV files. |
VTT (Video Transcript) Configuration
Field | Type | Default | Description |
|---|---|---|---|
| string | - | LLM model to use for transcript processing. |
Custom API Options
Field | Type | Description |
|---|---|---|
| array | Configuration for custom processing APIs. |
CustomApiOptions Object:
Field | Type | Description |
|---|---|---|
| enum | Type of customization: |
| string | Identifier of the registered custom API endpoint. |
| string | Optional JSON payload to send to the custom API. |
Metadata Configuration
Field | Type | Default | Description |
|---|---|---|---|
| object | - | Key-value pairs of custom metadata to attach to all ingested content. |
| boolean | false | If true, applies this configuration to all child folders when setting on a scope. |
| boolean | false | If true, content is indexed but hidden from chat search results. |
Metadata Extraction (AI-powered)
Field | Type | Default | Description |
|---|---|---|---|
| object | - | Configuration for automatic metadata extraction using LLMs. |
MetadataExtractionConfig Object:
Field | Type | Description |
|---|---|---|
| boolean | Whether to enable automatic metadata extraction. |
| object | Schema defining what metadata fields to extract. |
| string | LLM model to use for extraction. |
| number | Maximum input tokens to send to the LLM. |
MetadataFieldSchema (for each field in metadataSchema):
Field | Type | Description |
|---|---|---|
| enum | Field type: |
| string | Description to help the LLM understand what to extract. |
| boolean | Whether this field must be extracted. |
Example: AI Metadata Extraction Configuration
{
"metadataExtractionConfig": {
"enabled": true,
"languageModel": "gpt-4o-mini",
"maxInputTokens": 4000,
"metadataSchema": {
"document_date": {
"type": "string",
"description": "The date of the document in ISO format (YYYY-MM-DD)",
"required": true
},
"author": {
"type": "string",
"description": "The author or authors of the document",
"required": false
},
"topics": {
"type": "array",
"description": "Main topics covered in the document",
"required": true
}
}
}
}Other Assistant Configuration Options
Beyond ingestionConfig, there are several other configuration options that can be set in the Space/Assistant Advanced Settings.
Complete Assistant Settings Structure
{
// User Interface Type
userInterface: 'CHAT' | 'MAGIC_TABLE' | 'TRANSLATION',
// Model Selection Strategy
modelChoosing: 'BY_FUNCTION_CALL',
// PDF Highlighting in Chat
showPdfHighlighting: true | false,
// Auto-execute prompt on space entry
autoExecutePrompt: null | 'string',
// Ingestion Configuration (see above)
ingestionConfig: { ... },
// Speech-to-Text Configuration
sttConfig: {
grammarList: []
},
// Magic Table Configuration (for Due Diligence spaces)
magicTableConfig: {
answerLibrary: true | false,
hideSheetStatus: true | false
}
}Speech-to-Text Configuration (sttConfig)
The sttConfig object configures the Speech-to-Text (voice input) functionality for a space.
Field | Type | Default | Description |
|---|---|---|---|
| string[] | [] | List of phrases, words, or acronyms to help the speech recognition engine recognize company-specific terminology. |
What is grammarList?
The grammarList is an array of strings that help the speech recognition service (Microsoft Azure Speech-to-Text) better recognize domain-specific vocabulary, company names, acronyms, and technical terms that may not be in the standard vocabulary.
Use Cases
Company names:
["UniqueAI", "Acme Corp", "TechCo"]Industry acronyms:
["KPI", "ROI", "EBITDA", "P&L", "YoY"]Product names:
["UniqueChat", "MagicTable", "AgenticTable"]Technical terms:
["chunking", "embeddings", "vectorization"]
Example Configuration
{
"sttConfig": {
"grammarList": [
"UniqueAI",
"EBITDA",
"YoY",
"MoM",
"P&L",
"Due Diligence",
"KYC",
"AML"
]
}
}How it Works
When voice input is used in the chat, the grammar list phrases are sent to the Azure Speech-to-Text service as a PhraseListGrammar. This improves recognition accuracy for these specific terms, especially when they might otherwise be misinterpreted (e.g., "EBITDA" being recognized as "eat a" or similar).
Other Settings Reference
userInterface
Value | Description |
|---|---|
| Standard chat interface (default) |
| Magic Table / Agentic Table interface for structured data workflows |
| Translation-focused interface |
showPdfHighlighting
Value | Description |
|---|---|
| Enable PDF highlighting in chat responses (default) |
| Disable PDF highlighting |
autoExecutePrompt
Value | Description |
|---|---|
| No auto-execute prompt (default) |
| A prompt that automatically executes when user enters the space |
magicTableConfig (for Due Diligence/Magic Table spaces)
Field | Type | Default | Description |
|---|---|---|---|
| boolean | true | Enable/disable the answer library feature |
| boolean | false | Hide/show sheet status indicators |
Complete Example: Full Assistant Settings
{
"userInterface": "CHAT",
"modelChoosing": "BY_FUNCTION_CALL",
"showPdfHighlighting": true,
"autoExecutePrompt": null,
"ingestionConfig": {
"pdfReadMode": "DOC_INTELLIGENCE_DEFAULT",
"wordReadMode": "INGEST_WORD_AS_PDF",
"chunkMaxTokens": 600,
"chunkStrategy": "RECURSIVE_CHUNKING"
},
"sttConfig": {
"grammarList": [
"UniqueAI",
"EBITDA",
"Due Diligence",
"KYC"
]
}
}Configuration Examples
Example 1: Basic Space Configuration
{
"chunkMaxTokens": 600,
"chunkMinTokens": 3,
"chunkMaxTokensOnePager": 1000,
"documentMinTokens": 25,
"chunkStrategy": "RECURSIVE_CHUNKING",
"pdfReadMode": "DOC_INTELLIGENCE_DEFAULT",
"wordReadMode": "MAMMOTH_ONLY",
"pptReadMode": "INGEST_WITH_DEFAULT_SERVICE",
"excelReadMode": "INGEST_WITH_DEFAULT_SERVICE",
"jpgReadMode": "NO_INGESTION",
"uniqueIngestionMode": "INGESTION"
}Example 2: Excel-Heavy Content Configuration
{
"chunkStrategy": "RECURSIVE_CHUNKING",
"excelReadMode": "INGEST_WITH_DEFAULT_SERVICE",
"excelConfig": {
"rowsPerChunk": 50,
"tableFormat": "MARKDOWN",
"headerRows": [1],
"headerColumns": [1],
"maxEmptyTableRows": 2,
"maxEmptyTableCols": 3,
"tableChunkTokenLimit": 2500,
"maxRows": 10000,
"maxCols": 200
},
"shouldApplyToSubScopes": true
}Example 3: Contextual Chunking Configuration
{
"chunkMaxTokens": 500,
"chunkMinTokens": 10,
"chunkStrategy": "CONTEXTUAL_CHUNKING_LIGHT",
"chunkingConfiguration": {
"systemPrompt": "Summarize the key information from this document section.",
"tokens": 150,
"model": "gpt-4o-mini"
},
"pdfReadMode": "DOC_INTELLIGENCE_DEFAULT",
"uniqueIngestionMode": "INGESTION"
}