Confluence Connector

5 min read

Pre-release Notice: The Confluence Connector v2 is currently in alpha (2.0.0-alpha.4). Configuration options and behavior may change before the stable release.

Overview

The Confluence Connector is a service that synchronizes page content and file attachments from Confluence to the Unique knowledge base for RAG ingestion. It supports both Confluence Cloud and Confluence Data Center deployments.

For deployment, configuration, and operational details, see the IT Operator Guide.

Quick Summary

What it does: Synchronizes labeled Confluence pages and their attachments to Unique's AI knowledge base

Supported platforms: Confluence Cloud and Confluence Data Center

Authentication: OAuth 2.0 two-legged (recommended; Cloud and Data Center 10.1+) or Personal Access Token (Data Center below 10.1 only; not recommended)

Scheduling: Configurable automated scans (default: every 15 minutes)

Multi-tenancy: Multiple Confluence instances can be managed in a single deployment

Deployment: Kubernetes-based containerized application

Requirements

Confluence

Requirement

Details

Confluence Cloud

Active instance with an Atlassian Cloud ID

Confluence Data Center

Self-hosted instance with REST API access

Authentication

OAuth 2.0 application credentials (recommended) or a Personal Access Token (Data Center below 10.1 only; not recommended)

Permissions

Read access to spaces and pages that should be synchronized

Prerequisites:

  • Ability to create and apply labels on Confluence pages

  • An OAuth 2.0 application configured in Confluence (recommended), or a PAT for Data Center below 10.1 only (not recommended)

  • A configured scope in the Unique platform to receive ingested content

Authentication Methods

The connector supports OAuth 2.0 two-legged (2LO) for both Confluence Cloud and Data Center (10.1+), which is the recommended authentication method. Personal Access Token (PAT) is supported only on Data Center versions below 10.1 where OAuth 2.0 (2LO) is not available, and is not recommended. For Unique platform communication, cluster_local mode is available for in-cluster deployments and external mode for out-of-cluster deployments via Zitadel OAuth. See the Authentication Guide for full setup instructions, credential management, and token flows.

Features

Core Capabilities

Label-Driven Page Discovery

  • Pages are discovered via configurable Confluence labels and ingested in HTML format (Confluence storage representation)

  • A configurable label (e.g. ai-ingest) marks individual pages for synchronization

  • A second configurable label (e.g. ai-ingest-all) marks a page and all its descendant pages for synchronization

  • Operators must explicitly set both label names in their tenant configuration

  • Only pages in global spaces are scanned (Cloud also includes collaboration spaces)

Automatic Change Detection

  • A per-space file diff mechanism compares discovered items against the state stored in Unique, ingesting only new or modified items and removing deleted ones

  • Safety checks prevent accidental full deletion of content

Attachment Ingestion

  • File attachments on labeled pages are discovered and ingested alongside page content

  • Attachment ingestion can be enabled or disabled

  • Configurable file size limit (default: 200 MB)

  • Configurable allowed MIME types. Defaults cover PDF, the major Office formats, plain text, CSV, HTML, PNG, and JPEG. See Configuration for the full list.

Image Ingestion

When attachment ingestion is enabled, images embedded in Confluence pages (PNG and JPEG) are ingested as attachments. Confluence stores editor-inserted images (drag/drop, paste, "Insert image") as regular page attachments, so they flow through the same path as PDFs and Office files. Images inserted as external URLs (rather than uploaded) are not attachments and are not ingested.

The connector also requests OCR-based ingestion for each image automatically (attachments.imageOcr is enabled by default), so chunks are produced without any scope-side configuration. Set attachments.imageOcr = disabled to defer to the destination scope's own ingestionConfig.jpgReadMode. Other image formats (GIF, WebP, SVG, HEIC, BMP, TIFF) are not currently supported by the Unique ingestion service and should be left out of allowedMimeTypes.

Skipped Content Types

Content types database, whiteboard, and embed are explicitly skipped (no body available via API). Folders are not explicitly skipped but have no body, so they are excluded during ingestion. In both cases, descendants (such as sub-pages under a database or folder) are still discovered and ingested. Live Docs pass through as regular pages. See the Content Type Ingestion Map for the full breakdown by platform.

Scope Management

  • A pre-existing root scope is configured per tenant (must be created in Unique before the connector starts), with child scopes automatically created per Confluence space

  • See the Scope Hierarchy for details

Scheduled Synchronization

  • Sync runs on a configurable cron schedule (default: */15 * * * *, every 15 minutes)

  • An initial sync is triggered immediately on startup for each tenant

  • Concurrent sync runs for the same tenant are prevented (the second run is skipped)

Advanced Features

Multi-Tenancy

  • Multiple Confluence instances (tenants) can be configured in a single deployment, each with independent configuration, authentication, and sync schedules. See Architecture -- Multi-Tenancy Support for the isolation model and per-tenant service details.

Concurrency Control

  • Configurable page ingestion concurrency (default: 1)

  • Configurable API rate limits for both Confluence and Unique APIs

Observability

  • Structured JSON logging

  • OpenTelemetry metrics integration

  • Prometheus metrics endpoint

Security

  • OAuth 2.0 two-legged (2LO) authentication for Cloud and Data Center

  • Personal Access Token (PAT) support for Data Center below 10.1 only (not recommended; use OAuth 2.0 2LO on 10.1+)

  • Configurable rate limiting for Confluence and Unique API calls

v1-Compatible Key Format

  • Optional useV1KeyFormat setting for backward compatibility with Confluence Connector v1 ingestion keys

How It Works

High-Level Sync Flow

embedded_0b51600fc56fa523bc128615ef4b104e.png

Content Sync Flow

embedded_da3492620777f720f014b0fbef9c5a9f.png

See Technical Reference for detailed architecture and flow documentation.

User Workflow

  1. Administrator Setup (One-time) - Deploy the connector - Configure tenant YAML with Confluence credentials and Unique API endpoints - Set up the root scope in Unique

  2. Confluence Users (Ongoing) - Apply the ai-ingest label to individual pages they want synchronized - Apply the ai-ingest-all label to a parent page to synchronize it and all its descendants

  3. Automated Processing - The connector scans for labeled pages on the configured schedule - Discovers pages and their attachments - Computes a diff against previously ingested content - Ingests new and updated content, removes deleted content

Limitations and Constraints

Not Supported

  • Real-time synchronization (periodic scanning only)

  • Permission synchronization (content sync only)

  • Confluence databases, whiteboards, and embeds (these content types are automatically skipped)

  • Hierarchical scope structure (all pages from a space are placed in a single flat scope; sub-scopes mirroring the Confluence page tree are not created)

Considerations

Constraint

Impact

Mitigation

Pages must be explicitly labeled

No automatic sync of unlabeled content

Document the labeling workflow for end users

Single ingestion mode (flat)

All pages from a space are ingested into a single scope per space

Organize content into separate spaces if scope separation is needed

Horizontal scaling not supported

Single instance deployment

Adequate resource allocation; per-tenant concurrency tuning

Concurrent sync prevention

If a sync cycle for a tenant is still running when the next is scheduled, the new cycle is skipped

Adjust cron interval or concurrency settings for large instances

  • FAQ - Frequently asked questions and troubleshooting

For IT Operators

Technical Reference

Standard References

Last updated