Confluence Connector
5 min read
Pre-release Notice: The Confluence Connector v2 is currently in alpha (
2.0.0-alpha.4). Configuration options and behavior may change before the stable release.
Overview
The Confluence Connector is a service that synchronizes page content and file attachments from Confluence to the Unique knowledge base for RAG ingestion. It supports both Confluence Cloud and Confluence Data Center deployments.
For deployment, configuration, and operational details, see the IT Operator Guide.
Quick Summary
What it does: Synchronizes labeled Confluence pages and their attachments to Unique's AI knowledge base
Supported platforms: Confluence Cloud and Confluence Data Center
Authentication: OAuth 2.0 two-legged (recommended; Cloud and Data Center 10.1+) or Personal Access Token (Data Center below 10.1 only; not recommended)
Scheduling: Configurable automated scans (default: every 15 minutes)
Multi-tenancy: Multiple Confluence instances can be managed in a single deployment
Deployment: Kubernetes-based containerized application
Requirements
Confluence
Requirement | Details |
|---|---|
Confluence Cloud | Active instance with an Atlassian Cloud ID |
Confluence Data Center | Self-hosted instance with REST API access |
Authentication | OAuth 2.0 application credentials (recommended) or a Personal Access Token (Data Center below 10.1 only; not recommended) |
Permissions | Read access to spaces and pages that should be synchronized |
Prerequisites:
Ability to create and apply labels on Confluence pages
An OAuth 2.0 application configured in Confluence (recommended), or a PAT for Data Center below 10.1 only (not recommended)
A configured scope in the Unique platform to receive ingested content
Authentication Methods
The connector supports OAuth 2.0 two-legged (2LO) for both Confluence Cloud and Data Center (10.1+), which is the recommended authentication method. Personal Access Token (PAT) is supported only on Data Center versions below 10.1 where OAuth 2.0 (2LO) is not available, and is not recommended. For Unique platform communication, cluster_local mode is available for in-cluster deployments and external mode for out-of-cluster deployments via Zitadel OAuth. See the Authentication Guide for full setup instructions, credential management, and token flows.
Features
Core Capabilities
Label-Driven Page Discovery
Pages are discovered via configurable Confluence labels and ingested in HTML format (Confluence storage representation)
A configurable label (e.g.
ai-ingest) marks individual pages for synchronizationA second configurable label (e.g.
ai-ingest-all) marks a page and all its descendant pages for synchronizationOperators must explicitly set both label names in their tenant configuration
Only pages in global spaces are scanned (Cloud also includes collaboration spaces)
Automatic Change Detection
A per-space file diff mechanism compares discovered items against the state stored in Unique, ingesting only new or modified items and removing deleted ones
Safety checks prevent accidental full deletion of content
Attachment Ingestion
File attachments on labeled pages are discovered and ingested alongside page content
Attachment ingestion can be enabled or disabled
Configurable file size limit (default: 200 MB)
Configurable allowed MIME types. Defaults cover PDF, the major Office formats, plain text, CSV, HTML, PNG, and JPEG. See Configuration for the full list.
Image Ingestion
When attachment ingestion is enabled, images embedded in Confluence pages (PNG and JPEG) are ingested as attachments. Confluence stores editor-inserted images (drag/drop, paste, "Insert image") as regular page attachments, so they flow through the same path as PDFs and Office files. Images inserted as external URLs (rather than uploaded) are not attachments and are not ingested.
The connector also requests OCR-based ingestion for each image automatically (attachments.imageOcr is enabled by default), so chunks are produced without any scope-side configuration. Set attachments.imageOcr = disabled to defer to the destination scope's own ingestionConfig.jpgReadMode. Other image formats (GIF, WebP, SVG, HEIC, BMP, TIFF) are not currently supported by the Unique ingestion service and should be left out of allowedMimeTypes.
Skipped Content Types
Content types database, whiteboard, and embed are explicitly skipped (no body available via API). Folders are not explicitly skipped but have no body, so they are excluded during ingestion. In both cases, descendants (such as sub-pages under a database or folder) are still discovered and ingested. Live Docs pass through as regular pages. See the Content Type Ingestion Map for the full breakdown by platform.
Scope Management
A pre-existing root scope is configured per tenant (must be created in Unique before the connector starts), with child scopes automatically created per Confluence space
See the Scope Hierarchy for details
Scheduled Synchronization
Sync runs on a configurable cron schedule (default:
*/15 * * * *, every 15 minutes)An initial sync is triggered immediately on startup for each tenant
Concurrent sync runs for the same tenant are prevented (the second run is skipped)
Advanced Features
Multi-Tenancy
Multiple Confluence instances (tenants) can be configured in a single deployment, each with independent configuration, authentication, and sync schedules. See Architecture -- Multi-Tenancy Support for the isolation model and per-tenant service details.
Concurrency Control
Configurable page ingestion concurrency (default: 1)
Configurable API rate limits for both Confluence and Unique APIs
Observability
Structured JSON logging
OpenTelemetry metrics integration
Prometheus metrics endpoint
Security
OAuth 2.0 two-legged (2LO) authentication for Cloud and Data Center
Personal Access Token (PAT) support for Data Center below 10.1 only (not recommended; use OAuth 2.0 2LO on 10.1+)
Configurable rate limiting for Confluence and Unique API calls
v1-Compatible Key Format
Optional
useV1KeyFormatsetting for backward compatibility with Confluence Connector v1 ingestion keys
How It Works
High-Level Sync Flow

Content Sync Flow

See Technical Reference for detailed architecture and flow documentation.
User Workflow
Administrator Setup (One-time) - Deploy the connector - Configure tenant YAML with Confluence credentials and Unique API endpoints - Set up the root scope in Unique
Confluence Users (Ongoing) - Apply the
ai-ingestlabel to individual pages they want synchronized - Apply theai-ingest-alllabel to a parent page to synchronize it and all its descendantsAutomated Processing - The connector scans for labeled pages on the configured schedule - Discovers pages and their attachments - Computes a diff against previously ingested content - Ingests new and updated content, removes deleted content
Limitations and Constraints
Not Supported
Real-time synchronization (periodic scanning only)
Permission synchronization (content sync only)
Confluence databases, whiteboards, and embeds (these content types are automatically skipped)
Hierarchical scope structure (all pages from a space are placed in a single flat scope; sub-scopes mirroring the Confluence page tree are not created)
Considerations
Constraint | Impact | Mitigation |
|---|---|---|
Pages must be explicitly labeled | No automatic sync of unlabeled content | Document the labeling workflow for end users |
Single ingestion mode (flat) | All pages from a space are ingested into a single scope per space | Organize content into separate spaces if scope separation is needed |
Horizontal scaling not supported | Single instance deployment | Adequate resource allocation; per-tenant concurrency tuning |
Concurrent sync prevention | If a sync cycle for a tenant is still running when the next is scheduled, the new cycle is skipped | Adjust cron interval or concurrency settings for large instances |
Related Documentation
FAQ - Frequently asked questions and troubleshooting
For IT Operators
Operator Guide - Deployment, configuration, and operations
Authentication - Confluence and Unique auth setup
Configuration - Tenant config, environment variables, YAML settings
Deployment - Container and infrastructure setup
Technical Reference
Technical Reference - Architecture, flows, and design decisions
Architecture - System components and infrastructure
Flows - Sync flows, file diff, discovery
Permissions - Confluence API and Unique permissions
Security - Security practices and compliance
Standard References
Confluence Cloud REST API - Atlassian Confluence Cloud API documentation
Confluence Data Center REST API - Atlassian Confluence Data Center API documentation
Atlassian OAuth 2.0 (3LO) apps - Atlassian Cloud OAuth app setup (prerequisite for 2LO client credentials)
Confluence Query Language (CQL) - CQL reference for content search queries