Agentic Image Content Extraction for Infra Admins
12 min read
Overview
Field | Value |
|---|---|
Feature name | Agentic Image Content Extraction |
What it is | Vision AI figure extraction within the MDI pipeline (supersedes deprecated Agentic PDF Extraction) |
What it does | Extracts textual content from figures, charts, diagrams, and other visual elements detected within PDF pages during ingestion. Uses vision-capable AI models (Azure OpenAI) to interpret cropped figure images and merges the extracted text back into the page markdown at the correct position. Enhances the standard MDI (Microsoft Document Intelligence) pipeline — does not replace it. |
Who it's for | Knowledge base admins who need higher-quality ingestion of image-heavy documents (financial reports, research papers, regulatory filings). End users benefit from charts and graphs becoming searchable in AI chat. |
When to use | Pilot tenants with image-heavy document sets (financial reports, research papers, regulatory filings) where chart/graph content is currently lost during ingestion. Recommended to test on a sample folder before enabling space-wide. |
Deprecation | Agentic PDF Document Extraction ( |
Processing Flow
View diagram “image-extraction-flow” in Confluence
Step-by-step:
node-ingestion-worker splits the PDF into individual pages
For each page, the MDI Client calls Azure Document Intelligence with
extractFigures: trueMDI returns the page text/layout plus figure bounding polygons for each detected figure
The MDI Page Composer renders the PDF page as an image at the configured DPI
Each detected figure is cropped from the rendered page image using the polygon coordinates
Cropped figure images are sent (including the detected language by MDI) to the Image Extraction Adapter (up to 5 in parallel)
The adapter creates async jobs via
POST /image-content-extraction/extractionsand polls for resultsInside agentic-ingestion, the worker picks up each job and applies the configured strategy:
ONE_STEP — Single vision LLM call to directly extract content from the figure
TWO_STEP — First classify the image (chart, table, diagram, icon, etc.), then extract with a category-specific prompt; non-informational categories (icons, decorative images) are skipped
The vision LLM is called via
API_BASE(node-chat gateway). If the primary model fails, an automatic fallback to a secondary model is attemptedResults are returned to the composer, which merges figure texts into the page markdown at the correct positions, preserving reading order and captions
Fallback Behavior
Primary → Fallback model: If primary LLM fails (except content filter), retries with fallback model
Per-figure resilience: If extraction fails for an individual figure → empty text for that figure; rest of page composes normally
Pipeline fallback: If entire figure extraction pipeline fails → falls back to standard MDI output (no figure text)
Code Path (node-ingestion-worker)
PDF Ingestor Service
└─ pdfReadMode === DOC_INTELLIGENCE_DEFAULT
└─ imageContentExtraction.enabled === true
└─ applyDocIntelligenceOnPageWithImageContentExtraction()
└─ MSDocumentIntelligence.analyzeDocument(page, { extractFigures: true })
└─ MdiPageComposer.composePageWithFigures()
├─ Render PDF page as image (at configured DPI)
├─ Crop each figure using MDI polygon coordinates
└─ AgenticIngestionImageExtractionAdapter.extractImageContent()
├─ POST /image-content-extraction/extractions
└─ Poll GET /image-content-extraction/extractions/{job_id}Code Path (agentic-ingestion)
POST /image-content-extraction/extractions
└─ Enqueue in Redis (taskiq:image-content-extraction)
└─ Worker: process_image_extraction_job()
└─ run_image_extraction()
├─ ONE_STEP → ClassifierAndExtractor → single vision LLM call
└─ TWO_STEP → Classifier (categorize image)
├─ Non-informational category → skip (return empty)
└─ Informational category → Extractor (category-specific prompt)How to enable it
Enabling this feature requires changes in four places: deploying the agentic-ingestion service, configuring agentic-ingestion, configuring node-ingestion-worker, enabling the UI feature flag on knowledge-upload, and activating it per scope / space.
A. Deploy the agentic-ingestion service
The agentic-ingestion service must be deployed and running before the feature can be used. Image content extraction is a module within this service — no separate deployment is needed, but the service itself is a prerequisite.
Helm chart: The agentic-ingestion service is deployed via its own Helm chart as part of the Agentic Ingestion bundle
Environment variables required for setup:
Env var | Description | Example | Default | Required |
|---|---|---|---|---|
| Base URL for the Unique AI API (node-chat). Used for all LLM vision completions. |
| none (mandatory) | Yes |
| Enable/disable the image content extraction module. When |
|
| No, but must be set to |
| Enables page-batch figure extraction in |
|
| No |
| Keeps extracted |
|
| No |
| Redis hostname for job queue and result storage |
|
| Yes |
| Redis port |
|
| No |
| Redis password | (from Key Vault) |
| Yes |
| Redis username (if ACL is enabled) |
|
| Yes |
| Enable TLS for Redis (auto-enabled if port is 6380) |
|
| No |
If Redis runs in Cluster Mode | ||||
| Enable Redis cluster mode instead of standalone |
|
| No |
| JSON array of cluster nodes (required if cluster mode is on) |
|
| Yes, if in Cluster mode |
| Redis cluster password | (from Key Vault) |
| Yes, if in Cluster mode |
| Enable TLS for Redis cluster connections |
|
| No |
| Reject unauthorized TLS certificates for Redis |
|
| No |
Environment variables optional:
Env var | Description | Example | Default | Required |
|---|---|---|---|---|
| Max concurrent async task workers for image extraction jobs |
|
| No |
| Timeout in ms for each LLM vision completion call |
|
| No |
| TTL in seconds for job results stored in Redis |
|
| No |
| Timeout in seconds when polling for a task result internally |
|
| No |
| Application log level |
|
| No |
| Max request body size in bytes |
|
| No |
| Path to custom CA certificate for outbound TLS and Redis CA |
|
| No |
Secrets (via Key Vault / secret provider):
The following are typically wired via the Helm chart's secretProvider section from Azure Key Vault:
REDIS_HOST,REDIS_PASSWORD,REDIS_PORT— Redis connection credentials
Verification: After deployment, confirm the service is healthy by checking GET /probe and that the image extraction module is registered in the startup logs.
B. Configure node-ingestion-worker
The node-ingestion-worker needs to know where the agentic-ingestion service is:
Env var:
AGENTIC_INGESTION_BASE_URL— Base URL of the agentic-ingestion service (e.g.,http://agentic-ingestion.chat.svc:8081)Where: In the
env:section of the node-ingestion-worker's Helm values
Optional tuning env vars (all have sensible defaults and normally do not need to be set):
Env var | Description | Example | Default | Required |
|---|---|---|---|---|
| Base URL of the agentic-ingestion service |
| ““ | No, but must be set to enable the feature |
| Polling interval for async job status checks |
|
| No |
| Max wait time for a single figure extraction job |
|
| No |
| Retries per HTTP request |
|
| No |
C. Enable the UI feature flag and configure language models in the knowledge-upload app
The knowledge-upload app has a feature flag that controls whether the Image Content Extraction configuration section is visible in the folder ingestion settings UI. Without this flag, admins cannot enable the feature through the UI.
Feature flag:
Env var | Description | Example | Default | Required |
|---|---|---|---|---|
| Feature flag to enable UI |
|
| No, must be set to |
| Available language models shown in the UI dropdown for image content extraction |
| ““ | No, must be set to enable the feature |
Must be set to "true" in the env: section of the service's Helm values.
Language model options:
The available language models shown in the UI dropdown are configured via an environment variable on knowledge-upload:
Env var:
IMAGE_CONTENT_EXTRACTION_LANGUAGE_MODELSWhere: In the
env:section of knowledge-upload's Helm valuesFormat: Comma-separated
MODEL_KEY:Display LabelpairsExamples:
IMAGE_CONTENT_EXTRACTION_LANGUAGE_MODELS: "AZURE_GPT_51_2025_1113:GPT-5.1 (2025-11-13)"IMAGE_CONTENT_EXTRACTION_LANGUAGE_MODELS: "AZURE_GPT_4o_2024_0513:GPT-4o (2024-05-13)"Key points:
The
MODEL_KEYvalues must match model identifiers available via the platform API gateway (API_BASE/ node-chat). The model must have Vision and Structured Output support. We recommend one of the following models:AZURE_GPT_51_2025_1113 ("AZURE_GPT_51_2025_1113:GPT-5.1 (2025-11-13)")
This model performs best. Unfortunately, it is yet not available in Switzerland and requires the data processing to take place in the EU.
AZURE_GPT_4o_2024_0513 ("AZURE_GPT_4o_2024_0513:GPT-4o (2024-05-13)")
Only models listed in this env var will appear in the UI dropdown — if the env var is empty or unset, no models are shown and the feature cannot be configured
There is no server-side allowlist beyond this at the moment; the backend accepts any string as
languageModel. The agentic-ingestion service will log an error when the language model is not available.
Behavior when the flag is off: If
FEATURE_FLAG_ENABLE_IMAGE_CONTENT_EXTRACTION_UN_17223is"false"(or unset), the knowledge-upload UI hides the Image Content Extraction section and strips any existingimageContentExtractionconfig from saves, effectively disabling the feature for any newly saved ingestion configurations.
D. Per-space / per-folder: Ingestion configuration
Once the UI flag is enabled (step C), admins can activate the feature per space or folder through the Knowledge Base or Space Management UI.
Do not use CUSTOM_SINGLE_PAGE_API — that activates the deprecated Agentic PDF Document Extraction path, which is not being released. In case that was previously enabled, disable it.
pdfReadModeon the folder or Space File Ingestion Configuration must beDOC_INTELLIGENCE_DEFAULT— this is the standard MDI pipeline enhanced with figure extractionlanguageModelmust be a vision-capable model available via the platform API gateway
Enable Image Content Extraction on a folder in the Knowledge Base:
Navigate to the folder's Ingestion Configuration settings
The Image Content Extraction section will be visible (gated by the feature flag)
Toggle it on and select a language model from the dropdown
Enable Image Content Extraction for documents uploaded in a Space
Navigate to the Space Management
Select the Space you want to configure
Depending on your Space setup:
Unique Custom Space
Click on Advances Settings and add the
ingestionConfigas shown below
Note:enabled: trueis the actual per-space toggle; setting this tofalse(or omitting it) keeps standard MDI behaviorjson{ "ingestionConfig": { "pdfReadMode": "DOC_INTELLIGENCE_DEFAULT", "pdfConfig": { "imageContentExtraction": { "enabled": true, "languageModel": "AZURE_GPT_4o_2024_0513" } } } }
Unique AI Space
Navigate to Configuration > Advances Settings > Optimization > Configure File Ingestion
Make sure PDF mode is set to “Doc Intelligence Default”
The Image Content Extraction section will be visible (gated by the feature flag)
Toggle it on and select a language model from the dropdown
Migrating from Agentic PDF Document Extraction
If your tenant currently uses Agentic PDF Document Extraction, it is likely configured with:
pdfReadMode: CUSTOM_SINGLE_PAGE_APIcustomApiOptionsan Agentic Ingestion / Unique Text and Image Extraction API identifier
To migrate to Image Content Extraction:
Remove the
customApiOptionsentry for Agentic PDF Document Extraction.Change
pdfReadModetoDOC_INTELLIGENCE_DEFAULT.Add
pdfConfig.imageContentExtraction.enabled: true.Set
pdfConfig.imageContentExtraction.languageModelto a supported vision model.Ensure
AGENTIC_INGESTION_BASE_URLis configured onnode-ingestion-worker.Ensure
FEATURE_FLAG_ENABLE_IMAGE_CONTENT_EXTRACTION_UN_17223is enabled where required.Re-ingest affected documents if you want existing content to benefit from image extraction.
Prerequisites / dependencies
Prerequisite | Details |
|---|---|
agentic-ingestion service deployed and healthy | Must be deployed via Helm with Redis connectivity and |
Redis | Required for the async job queue ( |
Azure Document Intelligence (MDI) | Required in node-ingestion-worker for figure detection ( |
Azure OpenAI vision model | A vision-capable model (e.g., |
| Must be set so the adapter can reach the agentic-ingestion service. This is a new env var on node-ingestion-worker for this release. |
| Must be set to |
| Must be set with the available model options (comma-separated |
No new Key Vault secrets | N/A — the service uses the existing Redis credentials. |
No new routes / hostnames | N/A — internal cluster communication only (node-ingestion-worker → agentic-ingestion over HTTP). |
No new workload identity | N/A — uses existing service identity. |
Owner / point of contact
Role | Name / handle |
|---|---|
Owner / POC | DATA SCIENCE |
Rollback / disable steps
Quick disable (no redeployment needed)
The fastest way to disable the feature for a specific space/folder is to update the ingestion configuration via the UI or through the space configuration:
{
"ingestionConfig": {
"pdfReadMode": "DOC_INTELLIGENCE_DEFAULT",
"pdfConfig": {
"imageContentExtraction": {
"enabled": false
}
}
}
}Disable UI-wide (feature flag)
To hide the feature from all admins and prevent new configurations:
Set
FEATURE_FLAG_ENABLE_IMAGE_CONTENT_EXTRACTION_UN_17223: "false"on knowledge-uploadRe-apply the Helm release
The Image Content Extraction section disappears from the UI
Any subsequent save of ingestion config will automatically strip existing
imageContentExtractionsettings
Note: Existing spaces that already have
imageContentExtraction.enabled: truein their stored config will continue to use image extraction until someone re-saves their ingestion config (at which point the flag-off logic strips it). To force-disable for all spaces immediately, also disable the agentic-ingestion module (see below).
Disable service-wide (agentic-ingestion)
To disable the extraction module across all tenants at the service level:
Set
FEATURE_FLAG_ENABLE_IMAGE_CONTENT_EXTRACTION: "false"in the agentic-ingestion Helm valuesRe-apply the Helm release
The
/image-content-extractionendpoint will no longer be registerednode-ingestion-worker will fall back to standard MDI output (no figure text) for any spaces that had it enabled
Helm rollback
If the agentic-ingestion chart was upgraded specifically for this feature:
helm rollback <release-name> -n <namespace>Behavior on disable
In-flight jobs: Any jobs already in the Redis queue will be processed by remaining workers, or will expire after
IMAGE_CONTENT_EXTRACTION_REDIS_JOB_TTL_SECONDS(default: 3600s)Graceful fallback: node-ingestion-worker handles agentic-ingestion being unavailable gracefully — it falls back to standard MDI output for the page (text-only, no figure content)
No data loss: Disabling the feature does not affect previously ingested documents
Known limitations / side effects
Limitation | Impact |
|---|---|
Limited extraction quality | |
MDI figure detection and cropping are not always reliable | Bounding boxes can be inaccurate, especially for large or complex figures. This can lead to incorrect splitting of full-page charts or crops that miss important visual context such as titles, legends, and footnotes. |
Duplicate text can appear | MDI runs OCR on the full page, including text inside figures. Once we run image content extraction on the cropped figure, the same text may appear twice. |
Chunking can break figure content | Extracted figure content can be split across multiple chunks because of hard chunk boundaries, which hurts readability and retrieval. |
Prompt design has a precision vs reliability trade-off | Prompts that push for detailed numeric extraction can improve results on some charts, but on others they can produce wrong estimates or empty outputs when the figure does not support that level of detail. |
Handwritten content in figures | May not be reliably extracted by the vision model. |
Very small figures (< ~50px) | May produce low-quality or empty extractions. |
Other limitations | |
Agentic PDF Document Extraction is deprecated | The older |
Increases per-page processing time | Each figure adds ~5–15 seconds of processing (vision LLM call). Pages with 5+ figures can add 25–35 seconds due to parallel extraction (max 5 concurrent). |
Increases LLM API costs | Each figure requires a vision model API call. Documents with many figures (e.g., financial presentations) will incur noticeably higher LLM costs. |
Experimental disclaimer / support expectations
When to use: Pilot tenants with image-heavy document sets (financial reports, research papers, regulatory filings) where chart/graph content is currently lost during ingestion. Recommended to test on a sample folder before enabling space-wide.
Breaking changes: None expected. The feature is additive — it enhances MDI output without changing the standard pipeline behavior when disabled.
Deprecation: Agentic PDF Document Extraction (
CUSTOM_SINGLE_PAGE_APIwith the Agentic Ingestion API identifier) is deprecated and should not be enabled for new tenants. Image Content Extraction is the recommended replacement.