Agentic Image Content Extraction for Infra Admins

12 min read

This feature is currently in BETA. It may change before general availability, due to user and client feedback, but it targeted to be high quality and stable. Documentation may lag behind feature updates. Use in production environments at your own discretion. Please refer to our Upgrade and Release Process for more information.

Overview

Field

Value

Feature name

Agentic Image Content Extraction

What it is

Vision AI figure extraction within the MDI pipeline (supersedes deprecated Agentic PDF Extraction)

What it does

Extracts textual content from figures, charts, diagrams, and other visual elements detected within PDF pages during ingestion. Uses vision-capable AI models (Azure OpenAI) to interpret cropped figure images and merges the extracted text back into the page markdown at the correct position. Enhances the standard MDI (Microsoft Document Intelligence) pipeline — does not replace it.

Who it's for

Knowledge base admins who need higher-quality ingestion of image-heavy documents (financial reports, research papers, regulatory filings). End users benefit from charts and graphs becoming searchable in AI chat.

When to use

Pilot tenants with image-heavy document sets (financial reports, research papers, regulatory filings) where chart/graph content is currently lost during ingestion. Recommended to test on a sample folder before enabling space-wide.

Deprecation

Agentic PDF Document Extraction (CUSTOM_SINGLE_PAGE_API with the Agentic Ingestion API identifier) is deprecated and should not be enabled for new tenants. Image Content Extraction is the recommended replacement.

Processing Flow

View diagram “image-extraction-flow” in Confluence

Step-by-step:

  1. node-ingestion-worker splits the PDF into individual pages

  2. For each page, the MDI Client calls Azure Document Intelligence with extractFigures: true

  3. MDI returns the page text/layout plus figure bounding polygons for each detected figure

  4. The MDI Page Composer renders the PDF page as an image at the configured DPI

  5. Each detected figure is cropped from the rendered page image using the polygon coordinates

  6. Cropped figure images are sent (including the detected language by MDI) to the Image Extraction Adapter (up to 5 in parallel)

  7. The adapter creates async jobs via POST /image-content-extraction/extractions and polls for results

  8. Inside agentic-ingestion, the worker picks up each job and applies the configured strategy:

    • ONE_STEP — Single vision LLM call to directly extract content from the figure

    • TWO_STEP — First classify the image (chart, table, diagram, icon, etc.), then extract with a category-specific prompt; non-informational categories (icons, decorative images) are skipped

  9. The vision LLM is called via API_BASE (node-chat gateway). If the primary model fails, an automatic fallback to a secondary model is attempted

  10. Results are returned to the composer, which merges figure texts into the page markdown at the correct positions, preserving reading order and captions

Fallback Behavior

  • Primary → Fallback model: If primary LLM fails (except content filter), retries with fallback model

  • Per-figure resilience: If extraction fails for an individual figure → empty text for that figure; rest of page composes normally

  • Pipeline fallback: If entire figure extraction pipeline fails → falls back to standard MDI output (no figure text)

Code Path (node-ingestion-worker)

PDF Ingestor Service
  └─ pdfReadMode === DOC_INTELLIGENCE_DEFAULT
      └─ imageContentExtraction.enabled === true
          └─ applyDocIntelligenceOnPageWithImageContentExtraction()
              └─ MSDocumentIntelligence.analyzeDocument(page, { extractFigures: true })
                  └─ MdiPageComposer.composePageWithFigures()
                      ├─ Render PDF page as image (at configured DPI)
                      ├─ Crop each figure using MDI polygon coordinates
                      └─ AgenticIngestionImageExtractionAdapter.extractImageContent()
                          ├─ POST /image-content-extraction/extractions
                          └─ Poll GET /image-content-extraction/extractions/{job_id}

Code Path (agentic-ingestion)

POST /image-content-extraction/extractions
  └─ Enqueue in Redis (taskiq:image-content-extraction)
      └─ Worker: process_image_extraction_job()
          └─ run_image_extraction()
              ├─ ONE_STEP → ClassifierAndExtractor → single vision LLM call
              └─ TWO_STEP → Classifier (categorize image)
                  ├─ Non-informational category → skip (return empty)
                  └─ Informational category → Extractor (category-specific prompt)

How to enable it

Enabling this feature requires changes in four places: deploying the agentic-ingestion service, configuring agentic-ingestion, configuring node-ingestion-worker, enabling the UI feature flag on knowledge-upload, and activating it per scope / space.

A. Deploy the agentic-ingestion service

The agentic-ingestion service must be deployed and running before the feature can be used. Image content extraction is a module within this service — no separate deployment is needed, but the service itself is a prerequisite.

  • Helm chart: The agentic-ingestion service is deployed via its own Helm chart as part of the Agentic Ingestion bundle

Environment variables required for setup:

Env var

Description

Example

Default

Required

API_BASE

Base URL for the Unique AI API (node-chat). Used for all LLM vision completions.

http://node-chat.finance-gpt.svc.cluster.local:8092/public

none (mandatory)

Yes

FEATURE_FLAG_ENABLE_IMAGE_CONTENT_EXTRACTION_UN_17223

Enable/disable the image content extraction module. When false, the /image-content-extraction endpoint is not registered.

"true"

"false"

No, but must be set to "true" to enable the feature

FEATURE_FLAG_ENABLE_AGENTIC_INGESTION_PAGE_BATCH_FIGURE_EXTRACTION_UN_19457

Enables page-batch figure extraction in node-ingestion-worker. When enabled and Image Content Extraction is active for a document, the worker sends the full page image plus all detected figure bounding boxes/captions to agentic-ingestion in one batched request instead of sending one cropped image per figure.

"true"

"false"

No

FEATURE_FLAG_ENABLE_ATOMIC_FIGURE_CHUNKING_UN_19136

Keeps extracted <figure> blocks atomic during markdown chunking in node-ingestion-worker, so figure captions and extracted image text stay together and are not split across chunks.

"true"

"false"

No

REDIS_HOST

Redis hostname for job queue and result storage

redis-service.chat.svc (from Key Vault)

""

Yes

REDIS_PORT

Redis port

6379 (from Key Vault)

10101

No

REDIS_PASSWORD

Redis password

(from Key Vault)

null

Yes

REDIS_USERNAME

Redis username (if ACL is enabled)

default

null

Yes

REDIS_USE_TLS

Enable TLS for Redis (auto-enabled if port is 6380)

"true"

"false"

No

If Redis runs in Cluster Mode

REDIS_CLUSTER_MODE

Enable Redis cluster mode instead of standalone

"false"

"false"

No

REDIS_CLUSTER_NODES

JSON array of cluster nodes (required if cluster mode is on)

[{"host": "redis-0", "port": 6379}]

""

Yes, if in Cluster mode

REDIS_CLUSTER_PASSWORD

Redis cluster password

(from Key Vault)

null

Yes, if in Cluster mode

REDIS_CLUSTER_ENABLE_TLS

Enable TLS for Redis cluster connections

"false"

"false"

No

REDIS_TLS_REJECT_UNAUTHORIZED

Reject unauthorized TLS certificates for Redis

"true"

"true"

No

Environment variables optional:

Env var

Description

Example

Default

Required

IMAGE_CONTENT_EXTRACTION_MAX_WORKERS

Max concurrent async task workers for image extraction jobs

4

4

No

IMAGE_CONTENT_EXTRACTION_CHAT_COMPLETION_TIMEOUT

Timeout in ms for each LLM vision completion call

240000

240000

No

IMAGE_CONTENT_EXTRACTION_REDIS_JOB_TTL_SECONDS

TTL in seconds for job results stored in Redis

3600

3600

No

IMAGE_CONTENT_EXTRACTION_TIMEOUT

Timeout in seconds when polling for a task result internally

2.0

2.0

No

LOG_LEVEL

Application log level

INFO

INFO

No

MAX_CONTENT_LENGTH

Max request body size in bytes

67108864

67108864 (64 MB)

No

CUSTOM_CA_CERT_PATH

Path to custom CA certificate for outbound TLS and Redis CA

/etc/ssl/certs/ca.pem

null

No

Secrets (via Key Vault / secret provider):

The following are typically wired via the Helm chart's secretProvider section from Azure Key Vault:

  • REDIS_HOST, REDIS_PASSWORD, REDIS_PORT — Redis connection credentials

Verification: After deployment, confirm the service is healthy by checking GET /probe and that the image extraction module is registered in the startup logs.

B. Configure node-ingestion-worker

The node-ingestion-worker needs to know where the agentic-ingestion service is:

  • Env var: AGENTIC_INGESTION_BASE_URL — Base URL of the agentic-ingestion service (e.g., http://agentic-ingestion.chat.svc:8081)

  • Where: In the env: section of the node-ingestion-worker's Helm values

Optional tuning env vars (all have sensible defaults and normally do not need to be set):

Env var

Description

Example

Default

Required

AGENTIC_INGESTION_BASE_URL

Base URL of the agentic-ingestion service

http://agentic-ingestion.chat.svc:8081

““

No, but must be set to enable the feature

AGENTIC_INGESTION_IMAGE_POLLING_DURATION_MS

Polling interval for async job status checks

3000

3000

No

AGENTIC_INGESTION_IMAGE_TIMEOUT_MS

Max wait time for a single figure extraction job

300000

300000

No

AGENTIC_INGESTION_IMAGE_MAX_RETRIES

Retries per HTTP request

3

3

No

C. Enable the UI feature flag and configure language models in the knowledge-upload app

The knowledge-upload app has a feature flag that controls whether the Image Content Extraction configuration section is visible in the folder ingestion settings UI. Without this flag, admins cannot enable the feature through the UI.

Feature flag:

Env var

Description

Example

Default

Required

FEATURE_FLAG_ENABLE_IMAGE_CONTENT_EXTRACTION_UN_17223

Feature flag to enable UI

"true"

"false"

No, must be set to "true" to enable the feature

IMAGE_CONTENT_EXTRACTION_LANGUAGE_MODELS

Available language models shown in the UI dropdown for image content extraction

"AZURE_GPT_51_2025_1113:GPT-5.1 (2025-11-13)"

““

No, must be set to enable the feature

Must be set to "true" in the env: section of the service's Helm values.

Language model options:

The available language models shown in the UI dropdown are configured via an environment variable on knowledge-upload:

  • Env var: IMAGE_CONTENT_EXTRACTION_LANGUAGE_MODELS

  • Where: In the env: section of knowledge-upload's Helm values

  • Format: Comma-separated MODEL_KEY:Display Label pairs

  • Examples:

IMAGE_CONTENT_EXTRACTION_LANGUAGE_MODELS: "AZURE_GPT_51_2025_1113:GPT-5.1 (2025-11-13)"
IMAGE_CONTENT_EXTRACTION_LANGUAGE_MODELS: "AZURE_GPT_4o_2024_0513:GPT-4o (2024-05-13)"

Key points:

  • The MODEL_KEY values must match model identifiers available via the platform API gateway (API_BASE / node-chat). The model must have Vision and Structured Output support. We recommend one of the following models:

    • AZURE_GPT_51_2025_1113 ("AZURE_GPT_51_2025_1113:GPT-5.1 (2025-11-13)")

      • This model performs best. Unfortunately, it is yet not available in Switzerland and requires the data processing to take place in the EU.

    • AZURE_GPT_4o_2024_0513 ("AZURE_GPT_4o_2024_0513:GPT-4o (2024-05-13)")

  • Only models listed in this env var will appear in the UI dropdown — if the env var is empty or unset, no models are shown and the feature cannot be configured

  • There is no server-side allowlist beyond this at the moment; the backend accepts any string as languageModel. The agentic-ingestion service will log an error when the language model is not available.

Behavior when the flag is off: If FEATURE_FLAG_ENABLE_IMAGE_CONTENT_EXTRACTION_UN_17223 is "false" (or unset), the knowledge-upload UI hides the Image Content Extraction section and strips any existing imageContentExtraction config from saves, effectively disabling the feature for any newly saved ingestion configurations.

D. Per-space / per-folder: Ingestion configuration

Once the UI flag is enabled (step C), admins can activate the feature per space or folder through the Knowledge Base or Space Management UI.

note

Do not use CUSTOM_SINGLE_PAGE_API — that activates the deprecated Agentic PDF Document Extraction path, which is not being released. In case that was previously enabled, disable it.

info
  • pdfReadMode on the folder or Space File Ingestion Configuration must be DOC_INTELLIGENCE_DEFAULT — this is the standard MDI pipeline enhanced with figure extraction

  • languageModel must be a vision-capable model available via the platform API gateway

Enable Image Content Extraction on a folder in the Knowledge Base:

  1. Navigate to the folder's Ingestion Configuration settings

  2. The Image Content Extraction section will be visible (gated by the feature flag)

  3. Toggle it on and select a language model from the dropdown

Enable Image Content Extraction for documents uploaded in a Space

  1. Navigate to the Space Management

  2. Select the Space you want to configure

  3. Depending on your Space setup:

    1. Unique Custom Space

      1. Click on Advances Settings and add the ingestionConfig as shown below
        Note: enabled: true is the actual per-space toggle; setting this to false (or omitting it) keeps standard MDI behavior

        json
        {
          "ingestionConfig": {
            "pdfReadMode": "DOC_INTELLIGENCE_DEFAULT",
            "pdfConfig": {
              "imageContentExtraction": {
                "enabled": true,
                "languageModel": "AZURE_GPT_4o_2024_0513"
              }
            }
          }
        }
    2. Unique AI Space

      1. Navigate to Configuration > Advances Settings > Optimization > Configure File Ingestion

      2. Make sure PDF mode is set to “Doc Intelligence Default”

      3. The Image Content Extraction section will be visible (gated by the feature flag)

      4. Toggle it on and select a language model from the dropdown


Migrating from Agentic PDF Document Extraction

If your tenant currently uses Agentic PDF Document Extraction, it is likely configured with:

  • pdfReadMode: CUSTOM_SINGLE_PAGE_API

  • customApiOptions

  • an Agentic Ingestion / Unique Text and Image Extraction API identifier

To migrate to Image Content Extraction:

  1. Remove the customApiOptions entry for Agentic PDF Document Extraction.

  2. Change pdfReadMode to DOC_INTELLIGENCE_DEFAULT.

  3. Add pdfConfig.imageContentExtraction.enabled: true.

  4. Set pdfConfig.imageContentExtraction.languageModel to a supported vision model.

  5. Ensure AGENTIC_INGESTION_BASE_URL is configured on node-ingestion-worker.

  6. Ensure FEATURE_FLAG_ENABLE_IMAGE_CONTENT_EXTRACTION_UN_17223 is enabled where required.

  7. Re-ingest affected documents if you want existing content to benefit from image extraction.


Prerequisites / dependencies

Prerequisite

Details

agentic-ingestion service deployed and healthy

Must be deployed via Helm with Redis connectivity and API_BASE configured. This is the first deployment of this service if Agentic PDF Extraction was not previously enabled.

Redis

Required for the async job queue (taskiq:image-content-extraction). A shared Redis instance can be used, or the service can connect to the existing platform Redis (standalone or cluster mode).

Azure Document Intelligence (MDI)

Required in node-ingestion-worker for figure detection (extractFigures: true). This is the same MDI dependency used by the standard DOC_INTELLIGENCE_DEFAULT pipeline — no new MDI setup required.

Azure OpenAI vision model

A vision-capable model (e.g., AZURE_GPT_4o_2024_0513, AZURE_GPT_51_2025_1113) must be available via the platform API gateway (API_BASE / node-chat).

AGENTIC_INGESTION_BASE_URL on node-ingestion-worker

Must be set so the adapter can reach the agentic-ingestion service. This is a new env var on node-ingestion-worker for this release.

FEATURE_FLAG_ENABLE_IMAGE_CONTENT_EXTRACTION_UN_17223 on agentic-ingestion and knowledge-upload

Must be set to "true" to enable the backend service and expose the UI configuration section. Without this, admins cannot enable the feature through the UI.

IMAGE_CONTENT_EXTRACTION_LANGUAGE_MODELS on knowledge-upload

Must be set with the available model options (comma-separated MODEL_KEY:Label pairs). Without this, the UI dropdown is empty and the feature cannot be configured via the UI.

No new Key Vault secrets

N/A — the service uses the existing Redis credentials.

No new routes / hostnames

N/A — internal cluster communication only (node-ingestion-worker → agentic-ingestion over HTTP).

No new workload identity

N/A — uses existing service identity.


Owner / point of contact

Role

Name / handle

Owner / POC

DATA SCIENCE


Rollback / disable steps

Quick disable (no redeployment needed)

The fastest way to disable the feature for a specific space/folder is to update the ingestion configuration via the UI or through the space configuration:

json
{
  "ingestionConfig": {
    "pdfReadMode": "DOC_INTELLIGENCE_DEFAULT",
    "pdfConfig": {
      "imageContentExtraction": {
        "enabled": false
      }
    }
  }
}

Disable UI-wide (feature flag)

To hide the feature from all admins and prevent new configurations:

  1. Set FEATURE_FLAG_ENABLE_IMAGE_CONTENT_EXTRACTION_UN_17223: "false" on knowledge-upload

  2. Re-apply the Helm release

  3. The Image Content Extraction section disappears from the UI

  4. Any subsequent save of ingestion config will automatically strip existing imageContentExtraction settings

Note: Existing spaces that already have imageContentExtraction.enabled: true in their stored config will continue to use image extraction until someone re-saves their ingestion config (at which point the flag-off logic strips it). To force-disable for all spaces immediately, also disable the agentic-ingestion module (see below).

Disable service-wide (agentic-ingestion)

To disable the extraction module across all tenants at the service level:

  1. Set FEATURE_FLAG_ENABLE_IMAGE_CONTENT_EXTRACTION: "false" in the agentic-ingestion Helm values

  2. Re-apply the Helm release

  3. The /image-content-extraction endpoint will no longer be registered

  4. node-ingestion-worker will fall back to standard MDI output (no figure text) for any spaces that had it enabled

Helm rollback

If the agentic-ingestion chart was upgraded specifically for this feature:

bash
helm rollback <release-name> -n <namespace>

Behavior on disable

  • In-flight jobs: Any jobs already in the Redis queue will be processed by remaining workers, or will expire after IMAGE_CONTENT_EXTRACTION_REDIS_JOB_TTL_SECONDS (default: 3600s)

  • Graceful fallback: node-ingestion-worker handles agentic-ingestion being unavailable gracefully — it falls back to standard MDI output for the page (text-only, no figure content)

  • No data loss: Disabling the feature does not affect previously ingested documents


Known limitations / side effects

Limitation

Impact

Limited extraction quality

MDI figure detection and cropping are not always reliable

Bounding boxes can be inaccurate, especially for large or complex figures. This can lead to incorrect splitting of full-page charts or crops that miss important visual context such as titles, legends, and footnotes.

Duplicate text can appear

MDI runs OCR on the full page, including text inside figures. Once we run image content extraction on the cropped figure, the same text may appear twice.

Chunking can break figure content

Extracted figure content can be split across multiple chunks because of hard chunk boundaries, which hurts readability and retrieval.

Prompt design has a precision vs reliability trade-off

Prompts that push for detailed numeric extraction can improve results on some charts, but on others they can produce wrong estimates or empty outputs when the figure does not support that level of detail.

Handwritten content in figures

May not be reliably extracted by the vision model.

Very small figures (< ~50px)

May produce low-quality or empty extractions.

Other limitations

Agentic PDF Document Extraction is deprecated

The older CUSTOM_SINGLE_PAGE_API / Agentic PDF Extraction path should not be used. It is superseded by this feature, which achieves better results by enhancing the standard MDI pipeline rather than replacing it.

Increases per-page processing time

Each figure adds ~5–15 seconds of processing (vision LLM call). Pages with 5+ figures can add 25–35 seconds due to parallel extraction (max 5 concurrent).

Increases LLM API costs

Each figure requires a vision model API call. Documents with many figures (e.g., financial presentations) will incur noticeably higher LLM costs.


Experimental disclaimer / support expectations

  • When to use: Pilot tenants with image-heavy document sets (financial reports, research papers, regulatory filings) where chart/graph content is currently lost during ingestion. Recommended to test on a sample folder before enabling space-wide.

  • Breaking changes: None expected. The feature is additive — it enhances MDI output without changing the standard pipeline behavior when disabled.

  • Deprecation: Agentic PDF Document Extraction (CUSTOM_SINGLE_PAGE_API with the Agentic Ingestion API identifier) is deprecated and should not be enabled for new tenants. Image Content Extraction is the recommended replacement.

Last updated