Ingestion Worker Configuration

2 min read

The Ingestion Worker (IW) handles the extraction of text using Microsoft Document Intelligence (MDI) or a Custom API (Docling, Unique Agentic Ingestion (UAI), etc.) and split into distinct Chunks for any document uploaded to the Unique platform. The IW is used in two use cases and for each a distinct container is deployed:

  • Upload-to-Chat (UtC): node-ingestion-worker-chat

  • Upload-to-Knowledge-Base (UtKB): node-ingestion-worker

To ensure a smooth ingestion experience, the right configuration must be set depending on the scale of your company and the usage frequency of UtC and UtKB.

Diagram: Untitled Diagram-1750756952441

CPU / Memory per worker

Default Worker - Optimal resource requirements

resources:
  requests:
    cpu: 3
    memory: 3000Mi
  limits:
    cpu: 3
    memory: 3050Mi

Chat Worker - Optimal resource requirements

resources:
  limits:
    cpu: 3.5
    memory: 8050Mi
  requests:
    cpu: 3.5
    memory: 8000Mi

Auto-scaling Configuration

To ensure an optimal user experience in the UtC use case, it is essential to avoid any latency caused by container bootstrapping, as this would directly impact users. Typically, no documents are uploaded to the chat after business hours. Therefore, we suggest implementing auto-scaling using a cron-based schedule to match expected usage patterns.

In contrast, for the UtKB use case, some latency is acceptable since the user experience is not directly affected. Nevertheless, documents might be uploaded / ingested over night which requires a different configuration here.

Please note that the configuration below should be considered a starting point and will likely need to be further adapted and fine-tuned to align with your specific company environment.

Small Usage (< 50 MAU recommendations)

  • 20 - 100 average documents per day

  • 3 - 8 Peak concurrent users

# Knowledge Base Worker
eventBasedAutoscaling:
  minReplicaCount: 1
  maxReplicaCount: 3
  
# Chat Worker  
eventBasedAutoscaling:
  maxReplicaCount: 2
  cron:
    start: 0 7 * * 1-5    # Scale up during business hours
    end: 0 19 * * 1-5     # Scale down after hours
    desiredReplicas: "1"

Rationale: Low concurrent document processing, occasional chat usage, mostly single-user sessions.

Medium Usage (50-200 MAU recommendations)

  • 100 - 500 average documents per day

  • 8 - 25 peak concurrent users

# Knowledge Base Worker
eventBasedAutoscaling:
  minReplicaCount: 2
  maxReplicaCount: 6
  
# Chat Worker
eventBasedAutoscaling:
  maxReplicaCount: 4
  cron:
    start: 0 7 * * 1-5    # Scale up during business hours
    end: 0 19 * * 1-5     # Scale down after hours
    desiredReplicas: "2"

Rationale: Moderate concurrent usage, overlapping business hours, predictable usage patterns.

High Usage (200-800 MAU recommendations)

  • 500 - 2,000 average documents uploads per day

  • 25 - 80 peak concurrent users

# Default Worker
eventBasedAutoscaling:
  minReplicaCount: 3
  maxReplicaCount: 10
  
# Chat Worker
eventBasedAutoscaling:
  maxReplicaCount: 8
  cron:
    start: 0 6 * * 1-5
    end: 0 20 * * 1-5
    desiredReplicas: "4"

Rationale: High concurrent usage, extended business hours, multiple time zones, heavy document processing.

Enterprise Usage (800+ MAU recommendations)

  • 2,000+ average documents uploads per day

  • 80+ peak concurrent users

# Default Worker
eventBasedAutoscaling:
  minReplicaCount: 4
  maxReplicaCount: 15
  
# Chat Worker
eventBasedAutoscaling:
  maxReplicaCount: 12
  cron:
    start: 0 5 * * 1-7    # 24/7 operations
    end: 0 23 * * 1-7
    desiredReplicas: "6"

Rationale: Continuous operations, global usage patterns, high document volume, business-critical workloads

Last updated