Agentic Image Content Extraction Configuration
5 min read
Agentic Image Content Extraction enriches PDF ingestion by extracting text from images, charts, diagrams, and figures found inside PDF pages.
How It Fits Into Ingestion
When Image Content Extraction is enabled:
The PDF is processed with Microsoft Document Intelligence.
Figures are detected on each page.
The page or figure image is sent internally to
agentic-ingestion.A vision-capable language model extracts the meaningful content.
The extracted text is inserted back into the ingested page content.
The normal chunking and embedding flow continues.
Image Content Extraction is therefore an enrichment step inside PDF ingestion, not a separate ingestion mode.
Configuration Fields
Field | Required | Description |
|---|---|---|
pdfReadMode | Yes | Must be DOC_INTELLIGENCE_DEFAULT for Image Content Extraction in PDF ingestion. |
pdfConfig.imageContentExtraction.enabled | Yes | Enables or disables Image Content Extraction for PDF ingestion. |
pdfConfig.imageContentExtraction.languageModel | Yes, when enabled | Vision-capable model used to extract image content. |
pdfConfig.imageContentExtraction.settings | No | Advanced configuration for image rendering, fallback model, and extraction strategy. |
pdfConfig.imageContentExtraction.settings.imageProcessingConfig.dpiValue | No | DPI used when rendering PDF pages as images. Default is 150. |
pdfConfig.imageContentExtraction.settings.imageProcessingConfig.compressionQuality | No | Image compression quality. Default is 50. |
pdfConfig.imageContentExtraction.settings.languageModelFallbackConfig | No | Optional per-config fallback model. If omitted, the service default is used. |
| No | Extraction strategy. One of |
| No | Strategy-specific overrides, including the system and user prompts used for image extraction. Schema depends on |
Where Can This Be Configured?
Image Content Extraction is configured through the platform ingestionConfig.
Place | How it is configured | Scope |
|---|---|---|
Admin / Knowledge Upload UI | The ingestion configuration form writes pdfConfig.imageContentExtraction into the folder ingestion config. | Folder / scope default |
GraphQL setScopeProperties | Set properties.ingestionConfig on a scope. | Folder / scope default, optionally subfolders |
GraphQL contentUpsert / contentUpsertByChat | Pass input.ingestionConfig when uploading or upserting content. | Single uploaded content item |
SDK upload helpers | Pass the same object as ingestion_config / ingestionConfig, depending on the SDK language helper. | Single uploaded content item |
Folder-level configuration is used as the default for content uploaded into that folder. A per-upload ingestionConfig can override it for one content item.
How `ingestionConfig` looks like ?
`pdfReadMode` and `pdfConfig` are sibling fields inside the same `ingestionConfig` object.
Example:
{
"pdfReadMode": "DOC_INTELLIGENCE_DEFAULT",
"pdfConfig": {
"imageContentExtraction": {
"enabled": true,
"languageModel": "AZURE_GPT_51_2025_1113",
"settings": {
"imageProcessingConfig": {
"dpiValue": 150,
"compressionQuality": 50
},
"languageModelFallbackConfig": "AZURE_GPT_4o_2024_0513"
}
}
}
}Only add settings when there is a specific reason, such as image quality tuning or an engineering-approved fallback model override.
Recommended Defaults
For most tenants:
{
"pdfReadMode": "DOC_INTELLIGENCE_DEFAULT",
"pdfConfig": {
"imageContentExtraction": {
"enabled": true,
"languageModel": "<vision-capable-model>"
}
}
}Only add settings if there is a specific reason, such as image quality tuning or engineering-guided prompt changes.
Customizing the Extractor Prompts
The vision LLM that extracts image content runs with hardcoded default prompts that live in agentic-ingestion. Both the system and user prompts can be overridden per scope, per assistant, or per upload through strategyConfig — without redeploying the service.
The available prompt fields and their resolution order depend on settings.strategy.
ONE_STEP (default strategy)
Two configurable prompt fields, both at pdfConfig.imageContentExtraction.settings.strategyConfig:
Field | Type | Resolution order (first non-empty wins) |
|---|---|---|
| string |
|
| string |
|
Each prompt is resolved independently (you can override the system prompt without overriding the user prompt). Empty strings are treated as unset and fall through to the next level.
{language} placeholder. Only substituted when the prompt comes from an env var or from the hardcoded default. When you supply systemPrompt or userPrompt via strategyConfig, the string is sent to the LLM verbatim — bake the language into the text or omit the language instruction.
Structured output. The service binds the LLM response to a { "reasoning": "...", "image_content": "..." } JSON schema (Pydantic structured output). A custom system prompt should describe this output shape so the model complies; if it does not, the primary model call fails and the request falls back to the fallback model.
Auto-injected context. When a figure caption is available, the service prepends a caption hint to the system prompt; when a full-page image is sent as context, it prepends a page-context notice. These additions cannot be disabled per request.
Example:
{
"pdfReadMode": "DOC_INTELLIGENCE_DEFAULT",
"pdfConfig": {
"imageContentExtraction": {
"enabled": true,
"languageModel": "AZURE_GPT_51_2025_1113",
"settings": {
"strategy": "ONE_STEP",
"strategyConfig": {
"systemPrompt": "You are a legal-document image transcriber. Respond in English.\n\nOUTPUT FORMAT (JSON):\n{ \"reasoning\": \"string\", \"image_content\": \"string\" }",
"userPrompt": "Extract every visible clause, signature, and stamp from this image. Respond in English."
}
}
}
}
}TWO_STEP (classify, then extract)
When strategy: "TWO_STEP", each cropped figure is first classified into one of the keys of extractorCategoryToSystemPrompts (or one of the categories listed in noExtractionForCategories). A category-specific extractor prompt is then run.
All prompt fields live at pdfConfig.imageContentExtraction.settings.strategyConfig. There is no env-var override layer for TWO_STEP — the resolution chain is request → hardcoded default only.
Field | Type | Default |
|---|---|---|
| string |
|
| string |
|
| object (string → string) | 7 hardcoded entries: |
| object (string → string) | 7 hardcoded entries (same keys as above) |
| string |
|
| array of string |
|
| integer |
|
Note:
No
{language}substitution. TheTWO_STEPpath does not perform any language substitution. Bake the language into your strings.
Where overrides can be set
The same strategyConfig shape is accepted at every layer; only the JSON path used to reach it differs.
Layer | Path to |
|---|---|
GraphQL |
|
GraphQL |
|
SDK upload helpers ( | same path under the helper's ingestion-config argument |
Assistant | same path |
Knowledge-Upload UI Configuration textarea |
|
Direct |
|
Common Mistakes
Putting pdfConfig outside of ingestionConfig.
Putting imageContentExtraction directly at the root instead of under pdfConfig.
Enabling Image Content Extraction without selecting a vision-capable language model.
Selecting a model that does not support image input.
Changing dpiValue without considering image quality, latency, and token cost.
Overriding prompts via
strategyConfigwithout keeping the structured output contract. The service binds responses to{ "reasoning": "string", "image_content": "string" }; a custom system prompt should describe this shape so the model complies. If it does not, the primary call fails and the request falls back to the fallback model.Mixing
ONE_STEPandTWO_STEPkeys in the samestrategyConfig. Keys for the strategy you are not running are silently ignored.
Infra Dependencies
Image Content Extraction also depends on platform and deployment configuration:
The Image Content Extraction feature flag must be enabled.
agentic-ingestionmust be deployed and reachable bynode-ingestion-worker.Vision-capable models must be configured and available.
Redis/job queue configuration for
agentic-ingestionmust be healthy.
These are infra/operator concerns and should be documented in the Infra Admin page.
Troubleshooting Checklist
If Image Content Extraction does not run:
Check that
pdfReadModeisDOC_INTELLIGENCE_DEFAULT.Check that
pdfConfig.imageContentExtraction.enabledistrue.Check that
languageModelis configured.Check that the selected model supports vision.
Check that the feature flag is enabled.
Check that
node-ingestion-workercan reachagentic-ingestion.Check
agentic-ingestionlogs for image extraction errors.