Confluence Connector - Flows
9 min read
Content Sync Flow
The content sync flow runs periodically (default: every 15 minutes) to synchronize labeled Confluence pages and their attachments to Unique.
Overview

Sequence Diagram
The connector is stateless -- it does not maintain local state between sync cycles. Change detection is performed by the Unique platform's file diff API, called once per Confluence space.

Scope Hierarchy
The connector manages a two-level scope hierarchy:
Root scope -- Must pre-exist in the Unique platform. Configured via
ingestion.scopeId. The connector grants itself access at initialization and marks the scope as owned by this Confluence instance on the first sync cycle. Subsequent sync cycles verify that the ownership mark still matches the configured Confluence instance.Space scopes -- Created automatically as children of the root scope, one per Confluence space key. Access is inherited from the root scope.
Root Scope Ownership Validation
To prevent misconfiguration where the same Confluence instance is connected to multiple Unique organizations, or the same root scope is pointed at a different Confluence instance, the connector tags the root scope with an identifier derived from the Confluence instance on the first sync cycle:
Cloud: the identifier is derived from the
cloudIdconfigured in the tenant YAML.Data Center: the identifier is fetched from the Data Center's application manifest endpoint (
/rest/applinks/1.0/manifest). No authentication is required.
On every subsequent sync cycle, the connector reads the ownership mark from the root scope and compares it against the current Confluence instance. A mismatch aborts the tenant sync cycle with a fatal error.
Discovery Phase
Pages are discovered through a CQL-based label search, then attachments are optionally extracted from the already-fetched page objects.
Important: CQL search results are filtered by the authenticated user's permissions. Pages in spaces the service account cannot access are silently excluded from results. On Data Center, the service account's space access can optionally be restricted to specific spaces. When no spaces are specified, the service account has read access to all spaces. If space restrictions are configured, pages in excluded spaces never appear in discovery results and no error is produced.
CQL Queries
The connector uses Confluence Query Language (CQL) to discover pages. The exact CQL differs by instance type:
Instance Type | Space Type Filter | CQL Template |
|---|---|---|
Cloud |
|
|
Data Center |
| Same template, different space type filter |
The space type filter varies by instance type. See the Configuration Guide for details on which space types are scanned per platform.
The type != attachment clause excludes attachments from top-level CQL results since they are fetched via the expand=children.attachment parameter on the page objects themselves.
Descendant Discovery
Pages labeled with ingestAllLabel trigger a descendant search:
Collect all page IDs that carry the
ingestAllLabelBatch IDs into groups of 100
For each batch, execute CQL:
ancestor IN ({batch}) AND type != attachmentDeduplicate results with labeled pages by page ID
Attachment Extraction
When attachments.mode is enabled (default), attachments are extracted from the already-fetched page objects. Confluence inlines up to 25 attachments per page via the expand=children.attachment parameter during the CQL search. If a page has more than 25 attachments, additional attachments are fetched via pagination (Cloud uses the v2 attachments endpoint; Data Center uses v1 pagination via _links.next).
An attachment is accepted if: - Its mediaType (reported by the Confluence API) is in the allowedMimeTypes list (defaults cover PDF, the major Office formats, plain text, CSV, HTML, PNG, and JPEG) - Its file size does not exceed maxFileSizeMb (default: 200 MB) - The maxItemsToScan capacity has not been exhausted (pages count first, attachments use remaining capacity)
Images embedded in a page body (via the editor's drag/drop, paste, or "Insert image" actions) are stored by Confluence as regular page attachments and surface in the same expand=children.attachment results, so they are ingested through this path when image/png or image/jpeg is in allowedMimeTypes. Images inserted as external URLs are not attachments and are not ingested.
When attachments.imageOcr is enabled (default), each image content registration includes ingestionConfig.jpgReadMode = DOC_INTELLIGENCE_DEFAULT, which the Unique ingestion service merges over its environment defaults and the destination scope's own config (request body has highest precedence). This forces OCR-based processing on the worker so chunks are produced; without it, the worker default (NO_INGESTION) returns zero chunks and the worker raises FAILED_IMAGE.
Content Type Ingestion Map
The connector uses label-based discovery via CQL. After fetching, it skips three content types: database, whiteboard, and embed.
Content that passes the filter has its body.storage HTML extracted and ingested. Items with empty bodies are skipped. Descendants of skipped content types (such as sub-pages under a database) are still discovered and ingested.
Confluence Cloud
Content Type | Ingested? | Body Available via API? | Notes |
|---|---|---|---|
Page | Yes | Yes ( | Primary content type. Full body ingestion. |
Blog Post | Yes | Yes ( | Treated identically to pages by the connector. |
Attachment | Yes (conditional) | No (binary) | Only when |
Whiteboard | No | No (no body via API) | Explicitly skipped. API returns no body content. Descendants are still discovered. |
Database | No | No (structured data, not exposed) | Explicitly skipped. No body via API. Descendants (sub-pages) are still discovered and ingested. |
Embed / Smart Link | No | No (only has | Explicitly skipped. Only contains a URL reference, no renderable body. |
Folder | No (effectively) | No (organizational container) | Not explicitly skipped, but has no body -- skipped by the empty-body filter. Descendants are still discovered. |
Comment (inline/footer) | No | Yes ( | Not discovered -- comments do not appear in label/ancestor CQL results. |
Live Doc (page subtype) | Yes (as page) | Yes ( | Subtype of page. Passes through as a regular page. |
Custom Content (app-defined) | No | Yes ( | Not discovered -- uses |
Task (standalone) | No | Yes ( | Not a CQL-searchable content type. Only accessible via |
Confluence Data Center
Content Type | Ingested? | Notes |
|---|---|---|
Page | Yes | Primary content type. Full body ingestion (storage format / XHTML). |
Blog Post | Yes | Treated identically to pages by the connector. |
Attachment | Yes (conditional) | Only when |
Comment (inline/footer) | No | Not discovered -- comments do not appear in label/ancestor CQL results. |
Custom Content (plugin-defined) | No | Accessed via plugin-specific REST APIs, not standard |
Whiteboard, Database, Embed, Folder, and Live Doc are Cloud-only features and do not exist in Data Center.
File Diff Mechanism
The connector uses the Unique platform's server-side file diff API (/v2/content/file-diff) to detect changes between sync cycles. The connector does not compute local content hashes -- instead, it sends each item's key, url, and updatedAt timestamp to the diff endpoint, which returns categorized results. The diff is called once per Confluence space.
Change Detection Logic

File Diff Item Attributes
Each item sent to the diff API contains:
Attribute | Description | Used For |
|---|---|---|
| Unique key identifying the item (derived from page and attachment IDs) | Identity and change tracking |
| Confluence URL of the page or attachment | Location tracking |
| Last modification timestamp from Confluence | Change detection |
Safety Checks
After each space's file diff is computed, two guards run to prevent accidental full deletion of content. Both abort the current tenant sync cycle when triggered.
Guard | Trigger Condition | Rationale |
|---|---|---|
Zero-submission guard | Discovery submitted 0 items but the diff contains deletions | Likely a bug in page fetching or an authentication failure that returned no results. Deleting all stored content would be destructive. |
Full-deletion guard | The diff would delete every file stored in Unique for the space and the connector determines the deletion is not a legitimate full content replacement | Likely a key format change (e.g., toggling |
Partial deletions (removing some but not all files) are always allowed. To intentionally remove all content for a tenant, set the tenant status to deleted (see Configuration -- Tenant Status).
Ingestion Pipeline
The ingestion pipeline follows a 3-step process for each item: register, upload, and finalize.
Page Ingestion
For each new or updated page:
Fetch content -- The connector retrieves the full page content (HTML storage representation). The page is skipped if it is not found, if fetching fails, or if the page body is empty.
Extract labels -- Confluence labels are extracted from the page, excluding the configured sync labels. Labels are sorted alphabetically for deterministic ordering.
Register -- Sends metadata to the Unique ingestion API.
Upload -- The page body (Confluence storage format HTML) is uploaded to Unique.
Finalize -- Triggers downstream processing in Unique.
If any step fails for a page, the error is logged and that page is skipped. Other pages continue processing.
Attachment Ingestion
For each new or updated attachment:
Skip zero-byte -- Attachments with zero file size are skipped.
Register -- Sends metadata to the Unique ingestion API.
Download -- Streams the attachment from Confluence.
Upload -- The stream is uploaded to Unique with the original content type and content length.
Finalize -- Triggers downstream processing in Unique.
If any step fails, the stream is destroyed and the error is logged. Other attachments continue processing.
Deletion
Deleted items identified by the file diff are processed after ingestion:
Build content keys for each deleted item
Resolve keys to content IDs
Delete content by IDs
If no content is found for the given keys, a warning is logged and no deletion occurs.
Removed Space Cleanup
The file diff runs once per Confluence space that appears in the current discovery results. Spaces that have disappeared from discovery entirely (because the space was deleted in Confluence, or all ingest labels were removed from its pages) are not diffed and would otherwise leave orphaned content and scopes in Unique.
After the per-item deletion step, the connector detects orphaned space scopes by comparing the root scope's children against the set of Confluence space keys discovered during the current sync. For each orphaned space, the connector deletes both the space's files and the space scope itself. A failure for one space does not prevent the cleanup of other orphaned spaces.
Concurrency and Progress
Ingestion concurrency is controlled by processing.concurrency (default: 1). Pages are ingested first, then attachments -- not in parallel. All items in a batch are attempted even if some fail.
Progress is logged periodically. After all items complete, a summary is logged with the total, succeeded, and failed counts.
Error Handling
Error Handling Strategy
The connector applies scenario-specific behavior to keep sync cycles stable:
Scenario | Typical Cause | Connector Behavior |
|---|---|---|
Sync already in progress | Overlapping cron triggers or long-running sync | Skip the cycle entirely (re-entrancy guard) |
Root scope not found | Misconfigured | Abort the entire sync cycle |
Root scope belongs to a different Confluence instance |
| Abort the entire tenant sync cycle |
Discovery failure | Error during page or attachment fetching from Confluence | Abort the sync cycle to avoid submitting partial results to file-diff, which could cause unnecessary deletions and re-ingestion costs |
Accidental full deletion detected | Bug in page fetching or key format change | Abort the entire tenant sync cycle |
Page unavailable | Page deleted between discovery and content fetch, or transient API error | Log, skip the page, continue other pages |
Page with empty body | Page has no content (e.g., newly created) | Log, skip the page |
Attachment with zero bytes | Empty attachment | Log, skip the attachment |
Page ingestion failure | Upload error, registration error | Clean up partial content from KB, log error, skip the page |
Attachment ingestion failure | Download error, upload error | Destroy stream, clean up partial content from KB, log error, skip the attachment |
Content deletion failure | Unique API error | Log error, continue |
Unhandled sync error | Unexpected exception | Caught at top level, logged, sync state reset |
Retry Logic

Transient HTTP errors are retried automatically with backoff. Requests follow redirects (up to 10 redirections). Non-2xx responses that exhaust retries are thrown as errors with the status code and response content.
Related Documentation
Technical Reference - Architecture overview, key concepts, ingestion pipeline
Architecture - System components and infrastructure
Configuration - Scheduler and processing settings
Standard References
Confluence Cloud REST API - Atlassian Confluence Cloud API documentation
Confluence Data Center REST API - Atlassian Confluence Data Center API documentation
Confluence Query Language (CQL) - CQL reference for content search queries