Agentic Image Content Extraction for Infra Admins

12 min read

This feature is currently in BETA. It may change before general availability, due to user and client feedback, but it targeted to be high quality and stable. Documentation may lag behind feature updates. Use in production environments at your own discretion. Please refer to our Upgrade and Release Process for more information.

Overview

Field	Value
Feature name	Agentic Image Content Extraction
What it is	Vision AI figure extraction within the MDI pipeline (supersedes deprecated Agentic PDF Extraction)
What it does	Extracts textual content from figures, charts, diagrams, and other visual elements detected within PDF pages during ingestion. Uses vision-capable AI models (Azure OpenAI) to interpret cropped figure images and merges the extracted text back into the page markdown at the correct position. Enhances the standard MDI (Microsoft Document Intelligence) pipeline — does not replace it.
Who it's for	Knowledge base admins who need higher-quality ingestion of image-heavy documents (financial reports, research papers, regulatory filings). End users benefit from charts and graphs becoming searchable in AI chat.
When to use	Pilot tenants with image-heavy document sets (financial reports, research papers, regulatory filings) where chart/graph content is currently lost during ingestion. Recommended to test on a sample folder before enabling space-wide.
Deprecation	Agentic PDF Document Extraction (`CUSTOM_SINGLE_PAGE_API` with the Agentic Ingestion API identifier) is deprecated and should not be enabled for new tenants. Image Content Extraction is the recommended replacement.

Processing Flow

View diagram “image-extraction-flow” in Confluence

Step-by-step:

node-ingestion-worker splits the PDF into individual pages
For each page, the MDI Client calls Azure Document Intelligence with extractFigures: true
MDI returns the page text/layout plus figure bounding polygons for each detected figure
The MDI Page Composer renders the PDF page as an image at the configured DPI
Each detected figure is cropped from the rendered page image using the polygon coordinates
Cropped figure images are sent (including the detected language by MDI) to the Image Extraction Adapter (up to 5 in parallel)
The adapter creates async jobs via POST /image-content-extraction/extractions and polls for results
Inside agentic-ingestion, the worker picks up each job and applies the configured strategy:
- ONE_STEP — Single vision LLM call to directly extract content from the figure
- TWO_STEP — First classify the image (chart, table, diagram, icon, etc.), then extract with a category-specific prompt; non-informational categories (icons, decorative images) are skipped
The vision LLM is called via API_BASE (node-chat gateway). If the primary model fails, an automatic fallback to a secondary model is attempted
Results are returned to the composer, which merges figure texts into the page markdown at the correct positions, preserving reading order and captions

Fallback Behavior

Primary → Fallback model: If primary LLM fails (except content filter), retries with fallback model
Per-figure resilience: If extraction fails for an individual figure → empty text for that figure; rest of page composes normally
Pipeline fallback: If entire figure extraction pipeline fails → falls back to standard MDI output (no figure text)

Code Path (node-ingestion-worker)

PDF Ingestor Service
  └─ pdfReadMode === DOC_INTELLIGENCE_DEFAULT
      └─ imageContentExtraction.enabled === true
          └─ applyDocIntelligenceOnPageWithImageContentExtraction()
              └─ MSDocumentIntelligence.analyzeDocument(page, { extractFigures: true })
                  └─ MdiPageComposer.composePageWithFigures()
                      ├─ Render PDF page as image (at configured DPI)
                      ├─ Crop each figure using MDI polygon coordinates
                      └─ AgenticIngestionImageExtractionAdapter.extractImageContent()
                          ├─ POST /image-content-extraction/extractions
                          └─ Poll GET /image-content-extraction/extractions/{job_id}

Code Path (agentic-ingestion)

POST /image-content-extraction/extractions
  └─ Enqueue in Redis (taskiq:image-content-extraction)
      └─ Worker: process_image_extraction_job()
          └─ run_image_extraction()
              ├─ ONE_STEP → ClassifierAndExtractor → single vision LLM call
              └─ TWO_STEP → Classifier (categorize image)
                  ├─ Non-informational category → skip (return empty)
                  └─ Informational category → Extractor (category-specific prompt)

How to enable it

Enabling this feature requires changes in four places: deploying the agentic-ingestion service, configuring agentic-ingestion, configuring node-ingestion-worker, enabling the UI feature flag on knowledge-upload, and activating it per scope / space.

A. Deploy the agentic-ingestion service

The agentic-ingestion service must be deployed and running before the feature can be used. Image content extraction is a module within this service — no separate deployment is needed, but the service itself is a prerequisite.

Helm chart: The agentic-ingestion service is deployed via its own Helm chart as part of the Agentic Ingestion bundle

Environment variables required for setup:

Env var	Description	Example	Default	Required
`API_BASE`	Base URL for the Unique AI API (node-chat). Used for all LLM vision completions.	`http://node-chat.finance-gpt.svc.cluster.local:8092/public`	none (mandatory)	Yes
`FEATURE_FLAG_ENABLE_IMAGE_CONTENT_EXTRACTION_UN_17223`	Enable/disable the image content extraction module. When `false`, the `/image-content-extraction` endpoint is not registered.	`"true"`	`"false"`	No, but must be set to `"true"` to enable the feature
`FEATURE_FLAG_ENABLE_AGENTIC_INGESTION_PAGE_BATCH_FIGURE_EXTRACTION_UN_19457`	Enables page-batch figure extraction in `node-ingestion-worker`. When enabled and Image Content Extraction is active for a document, the worker sends the full page image plus all detected figure bounding boxes/captions to `agentic-ingestion` in one batched request instead of sending one cropped image per figure.	`"true"`	`"false"`	No
`FEATURE_FLAG_ENABLE_ATOMIC_FIGURE_CHUNKING_UN_19136`	Keeps extracted `<figure>` blocks atomic during markdown chunking in `node-ingestion-worker`, so figure captions and extracted image text stay together and are not split across chunks.	`"true"`	`"false"`	No
`REDIS_HOST`	Redis hostname for job queue and result storage	`redis-service.chat.svc` (from Key Vault)	`""`	Yes
`REDIS_PORT`	Redis port	`6379` (from Key Vault)	`10101`	No
`REDIS_PASSWORD`	Redis password	(from Key Vault)	`null`	Yes
`REDIS_USERNAME`	Redis username (if ACL is enabled)	`default`	`null`	Yes
`REDIS_USE_TLS`	Enable TLS for Redis (auto-enabled if port is 6380)	`"true"`	`"false"`	No
If Redis runs in Cluster Mode
`REDIS_CLUSTER_MODE`	Enable Redis cluster mode instead of standalone	`"false"`	`"false"`	No
`REDIS_CLUSTER_NODES`	JSON array of cluster nodes (required if cluster mode is on)	`[{"host": "redis-0", "port": 6379}]`	`""`	Yes, if in Cluster mode
`REDIS_CLUSTER_PASSWORD`	Redis cluster password	(from Key Vault)	`null`	Yes, if in Cluster mode
`REDIS_CLUSTER_ENABLE_TLS`	Enable TLS for Redis cluster connections	`"false"`	`"false"`	No
`REDIS_TLS_REJECT_UNAUTHORIZED`	Reject unauthorized TLS certificates for Redis	`"true"`	`"true"`	No

Environment variables optional:

Env var	Description	Example	Default	Required
`IMAGE_CONTENT_EXTRACTION_MAX_WORKERS`	Max concurrent async task workers for image extraction jobs	`4`	`4`	No
`IMAGE_CONTENT_EXTRACTION_CHAT_COMPLETION_TIMEOUT`	Timeout in ms for each LLM vision completion call	`240000`	`240000`	No
`IMAGE_CONTENT_EXTRACTION_REDIS_JOB_TTL_SECONDS`	TTL in seconds for job results stored in Redis	`3600`	`3600`	No
`IMAGE_CONTENT_EXTRACTION_TIMEOUT`	Timeout in seconds when polling for a task result internally	`2.0`	`2.0`	No
`LOG_LEVEL`	Application log level	`INFO`	`INFO`	No
`MAX_CONTENT_LENGTH`	Max request body size in bytes	`67108864`	`67108864` (64 MB)	No
`CUSTOM_CA_CERT_PATH`	Path to custom CA certificate for outbound TLS and Redis CA	`/etc/ssl/certs/ca.pem`	`null`	No

Secrets (via Key Vault / secret provider):

The following are typically wired via the Helm chart's secretProvider section from Azure Key Vault:

REDIS_HOST, REDIS_PASSWORD, REDIS_PORT — Redis connection credentials

Verification: After deployment, confirm the service is healthy by checking GET /probe and that the image extraction module is registered in the startup logs.

B. Configure node-ingestion-worker

The node-ingestion-worker needs to know where the agentic-ingestion service is:

Env var: AGENTIC_INGESTION_BASE_URL — Base URL of the agentic-ingestion service (e.g., http://agentic-ingestion.chat.svc:8081)
Where: In the env: section of the node-ingestion-worker's Helm values

Optional tuning env vars (all have sensible defaults and normally do not need to be set):

Env var	Description	Example	Default	Required
`AGENTIC_INGESTION_BASE_URL`	Base URL of the agentic-ingestion service	`http://agentic-ingestion.chat.svc:8081`	““	No, but must be set to enable the feature
`AGENTIC_INGESTION_IMAGE_POLLING_DURATION_MS`	Polling interval for async job status checks	`3000`	`3000`	No
`AGENTIC_INGESTION_IMAGE_TIMEOUT_MS`	Max wait time for a single figure extraction job	`300000`	`300000`	No
`AGENTIC_INGESTION_IMAGE_MAX_RETRIES`	Retries per HTTP request	`3`	`3`	No

C. Enable the UI feature flag and configure language models in the knowledge-upload and admin app

The knowledge-upload and admin app have a feature flag that controls whether the Image Content Extraction configuration section is visible in the folder ingestion settings UI. Without this flag, admins cannot enable the feature through the UI.

Feature flag:

Env var	Description	Example	Default	Required
`FEATURE_FLAG_ENABLE_IMAGE_CONTENT_EXTRACTION_UN_17223`	Feature flag to enable UI	`"true"`	`"false"`	No, must be set to `"true"` to enable the feature
`IMAGE_CONTENT_EXTRACTION_LANGUAGE_MODELS`	Available language models shown in the UI dropdown for image content extraction	`"AZURE_GPT_51_2025_1113:GPT-5.1 (2025-11-13)"`	““	No, must be set to enable the feature

Must be set to "true" in the env: section of the service's Helm values.

Language model options:

The available language models shown in the UI dropdown are configured via an environment variable on knowledge-upload and admin:

Env var: IMAGE_CONTENT_EXTRACTION_LANGUAGE_MODELS
Where: In the env: section of knowledge-upload's and admin’s Helm values
Format: Comma-separated MODEL_KEY:Display Label pairs
Examples:

IMAGE_CONTENT_EXTRACTION_LANGUAGE_MODELS: "AZURE_GPT_51_2025_1113:GPT-5.1 (2025-11-13)"

IMAGE_CONTENT_EXTRACTION_LANGUAGE_MODELS: "AZURE_GPT_4o_2024_0513:GPT-4o (2024-05-13)"

Key points:

The MODEL_KEY values must match model identifiers available via the platform API gateway (API_BASE / node-chat). The model must have Vision and Structured Output support. We recommend one of the following models:
- AZURE_GPT_51_2025_1113 ("AZURE_GPT_51_2025_1113:GPT-5.1 (2025-11-13)")
  - This model performs best. Unfortunately, it is yet not available in Switzerland and requires the data processing to take place in the EU.
- AZURE_GPT_4o_2024_0513 ("AZURE_GPT_4o_2024_0513:GPT-4o (2024-05-13)")
Only models listed in this env var will appear in the UI dropdown — if the env var is empty or unset, no models are shown and the feature cannot be configured
There is no server-side allowlist beyond this at the moment; the backend accepts any string as languageModel. The agentic-ingestion service will log an error when the language model is not available.

Behavior when the flag is off: If FEATURE_FLAG_ENABLE_IMAGE_CONTENT_EXTRACTION_UN_17223 is "false" (or unset), the knowledge-upload UI hides the Image Content Extraction section and strips any existing imageContentExtraction config from saves, effectively disabling the feature for any newly saved ingestion configurations.

D. Per-space / per-folder: Ingestion configuration

Once the UI flag is enabled (step C), admins can activate the feature per space or folder through the Knowledge Base or Space Management UI.

Do not use CUSTOM_SINGLE_PAGE_API — that activates the deprecated Agentic PDF Document Extraction path, which is not being released. In case that was previously enabled, disable it.

pdfReadMode on the folder or Space File Ingestion Configuration must be DOC_INTELLIGENCE_DEFAULT — this is the standard MDI pipeline enhanced with figure extraction
languageModel must be a vision-capable model available via the platform API gateway

Enable Image Content Extraction on a folder in the Knowledge Base:

Navigate to the folder's Ingestion Configuration settings
The Image Content Extraction section will be visible (gated by the feature flag)
Toggle it on and select a language model from the dropdown

Enable Image Content Extraction for documents uploaded in a Space

Navigate to the Space Management
Select the Space you want to configure
Depending on your Space setup:
1. Unique Custom Space
  1. Click on Advances Settings and add the ingestionConfig as shown below
    Note: enabled: true is the actual per-space toggle; setting this to false (or omitting it) keeps standard MDI behavior
    json
    { "ingestionConfig": { "pdfReadMode": "DOC_INTELLIGENCE_DEFAULT", "pdfConfig": { "imageContentExtraction": { "enabled": true, "languageModel": "AZURE_GPT_4o_2024_0513" } } } }
2. Unique AI Space
  1. Navigate to Configuration > Advances Settings > Optimization > Configure File Ingestion
  2. Make sure PDF mode is set to “Doc Intelligence Default”
  3. The Image Content Extraction section will be visible (gated by the feature flag)
  4. Toggle it on and select a language model from the dropdown

Migrating from Agentic PDF Document Extraction

If your tenant currently uses Agentic PDF Document Extraction, it is likely configured with:

pdfReadMode: CUSTOM_SINGLE_PAGE_API
customApiOptions
an Agentic Ingestion / Unique Text and Image Extraction API identifier

To migrate to Image Content Extraction:

Remove the customApiOptions entry for Agentic PDF Document Extraction.
Change pdfReadMode to DOC_INTELLIGENCE_DEFAULT.
Add pdfConfig.imageContentExtraction.enabled: true.
Set pdfConfig.imageContentExtraction.languageModel to a supported vision model.
Ensure AGENTIC_INGESTION_BASE_URL is configured on node-ingestion-worker.
Ensure FEATURE_FLAG_ENABLE_IMAGE_CONTENT_EXTRACTION_UN_17223 is enabled where required.
Re-ingest affected documents if you want existing content to benefit from image extraction.

Prerequisites / dependencies

Prerequisite	Details
agentic-ingestion service deployed and healthy	Must be deployed via Helm with Redis connectivity and `API_BASE` configured. This is the first deployment of this service if Agentic PDF Extraction was not previously enabled.
Redis	Required for the async job queue (`taskiq:image-content-extraction`). A shared Redis instance can be used, or the service can connect to the existing platform Redis (standalone or cluster mode).
Azure Document Intelligence (MDI)	Required in node-ingestion-worker for figure detection (`extractFigures: true`). This is the same MDI dependency used by the standard `DOC_INTELLIGENCE_DEFAULT` pipeline — no new MDI setup required.
Azure OpenAI vision model	A vision-capable model (e.g., `AZURE_GPT_4o_2024_0513`, `AZURE_GPT_51_2025_1113`) must be available via the platform API gateway (`API_BASE` / node-chat).
`AGENTIC_INGESTION_BASE_URL` on node-ingestion-worker	Must be set so the adapter can reach the agentic-ingestion service. This is a new env var on node-ingestion-worker for this release.
`FEATURE_FLAG_ENABLE_IMAGE_CONTENT_EXTRACTION_UN_17223` on agentic-ingestion and knowledge-upload	Must be set to `"true"` to enable the backend service and expose the UI configuration section. Without this, admins cannot enable the feature through the UI.
`IMAGE_CONTENT_EXTRACTION_LANGUAGE_MODELS` on knowledge-upload	Must be set with the available model options (comma-separated `MODEL_KEY:Label` pairs). Without this, the UI dropdown is empty and the feature cannot be configured via the UI.
No new Key Vault secrets	N/A — the service uses the existing Redis credentials.
No new routes / hostnames	N/A — internal cluster communication only (node-ingestion-worker → agentic-ingestion over HTTP).
No new workload identity	N/A — uses existing service identity.

Owner / point of contact

Role	Name / handle
Owner / POC	DATA SCIENCE

Rollback / disable steps

Quick disable (no redeployment needed)

The fastest way to disable the feature for a specific space/folder is to update the ingestion configuration via the UI or through the space configuration:

json

{
  "ingestionConfig": {
    "pdfReadMode": "DOC_INTELLIGENCE_DEFAULT",
    "pdfConfig": {
      "imageContentExtraction": {
        "enabled": false
      }
    }
  }
}

Disable UI-wide (feature flag)

To hide the feature from all admins and prevent new configurations:

Set FEATURE_FLAG_ENABLE_IMAGE_CONTENT_EXTRACTION_UN_17223: "false" on knowledge-upload
Re-apply the Helm release
The Image Content Extraction section disappears from the UI
Any subsequent save of ingestion config will automatically strip existing imageContentExtraction settings

Note: Existing spaces that already have imageContentExtraction.enabled: true in their stored config will continue to use image extraction until someone re-saves their ingestion config (at which point the flag-off logic strips it). To force-disable for all spaces immediately, also disable the agentic-ingestion module (see below).

Disable service-wide (agentic-ingestion)

To disable the extraction module across all tenants at the service level:

Set FEATURE_FLAG_ENABLE_IMAGE_CONTENT_EXTRACTION: "false" in the agentic-ingestion Helm values
Re-apply the Helm release
The /image-content-extraction endpoint will no longer be registered
node-ingestion-worker will fall back to standard MDI output (no figure text) for any spaces that had it enabled

Helm rollback

If the agentic-ingestion chart was upgraded specifically for this feature:

bash

helm rollback <release-name> -n <namespace>

Behavior on disable

In-flight jobs: Any jobs already in the Redis queue will be processed by remaining workers, or will expire after IMAGE_CONTENT_EXTRACTION_REDIS_JOB_TTL_SECONDS (default: 3600s)
Graceful fallback: node-ingestion-worker handles agentic-ingestion being unavailable gracefully — it falls back to standard MDI output for the page (text-only, no figure content)
No data loss: Disabling the feature does not affect previously ingested documents

Known limitations / side effects

Limitation	Impact
Limited extraction quality
MDI figure detection and cropping are not always reliable	Bounding boxes can be inaccurate, especially for large or complex figures. This can lead to incorrect splitting of full-page charts or crops that miss important visual context such as titles, legends, and footnotes.
Duplicate text can appear	MDI runs OCR on the full page, including text inside figures. Once we run image content extraction on the cropped figure, the same text may appear twice.
Chunking can break figure content	Extracted figure content can be split across multiple chunks because of hard chunk boundaries, which hurts readability and retrieval.
Prompt design has a precision vs reliability trade-off	Prompts that push for detailed numeric extraction can improve results on some charts, but on others they can produce wrong estimates or empty outputs when the figure does not support that level of detail.
Handwritten content in figures	May not be reliably extracted by the vision model.
Very small figures (< ~50px)	May produce low-quality or empty extractions.
Other limitations
Agentic PDF Document Extraction is deprecated	The older `CUSTOM_SINGLE_PAGE_API` / Agentic PDF Extraction path should not be used. It is superseded by this feature, which achieves better results by enhancing the standard MDI pipeline rather than replacing it.
Increases per-page processing time	Each figure adds ~5–15 seconds of processing (vision LLM call). Pages with 5+ figures can add 25–35 seconds due to parallel extraction (max 5 concurrent).
Increases LLM API costs	Each figure requires a vision model API call. Documents with many figures (e.g., financial presentations) will incur noticeably higher LLM costs.

Experimental disclaimer / support expectations

When to use: Pilot tenants with image-heavy document sets (financial reports, research papers, regulatory filings) where chart/graph content is currently lost during ingestion. Recommended to test on a sample folder before enabling space-wide.
Breaking changes: None expected. The feature is additive — it enhances MDI output without changing the standard pipeline behavior when disabled.
Deprecation: Agentic PDF Document Extraction (CUSTOM_SINGLE_PAGE_API with the Agentic Ingestion API identifier) is deprecated and should not be enabled for new tenants. Image Content Extraction is the recommended replacement.