Confluence Connector - FAQ
13 min read
General
What type of connector is this?
Answer: The Confluence Connector is a pull-based synchronization service that periodically scans Confluence spaces for labeled pages and syncs their content and attachments to the Unique knowledge base.
Key characteristics:
Runs on a configurable cron schedule (default: every 15 minutes)
Pulls content from Confluence via REST API
Requires explicit labeling of pages to trigger synchronization
Operates as a background service without user interaction
Supports both Confluence Cloud and Confluence Data Center
How does this differ from the Confluence Connector v1?
Answer:
Aspect | v1 | v2 |
|---|---|---|
Multi-tenancy | Not supported | Multiple Confluence instances in a single connector pod |
Attachment ingestion | Not supported | Supported with configurable MIME types and size limits (includes embedded images) |
Change detection | File-diff mechanism (pages only) | File-diff mechanism (pages and attachments) |
Safety guards | None | Full-deletion prevention, concurrent sync prevention |
Key format |
|
|
Labels and Page Discovery
How does the connector decide which pages to sync?
Answer: The connector uses two configurable Confluence labels: one for single-page sync (e.g. ai-ingest) and one for syncing a page and all its descendants (e.g. ai-ingest-all). Both labels must be explicitly set in the tenant configuration. See the README for a full overview of label-driven discovery and the technical flows documentation for the detailed CQL-based discovery process.
What happens when a page has both labels?
Answer: The page is deduplicated. The connector merges all labeled pages and their descendants into a single unique set (by page ID), so no page is ingested twice.
What happens when a page has ai-ingest and its ancestor has ai-ingest-all?
Answer: The page is discovered through both paths but deduplicated by ID, so it is ingested exactly once. See Descendant Discovery for details on how deduplication works.
Which Confluence content types are synced?
Answer: The connector ingests page and blogpost content (including Live Docs, which are a page subtype). Attachments are ingested conditionally when attachments.mode=enabled. Content types database, whiteboard, and embed are explicitly skipped because their APIs expose no renderable body. Folders have no body and are effectively skipped by the empty-body filter. When a skipped content type (such as a database) carries the all-descendants label, its child pages are still discovered and ingested; only the skipped parent itself is excluded. See the Content Type Ingestion Map for the full Cloud and Data Center breakdown.
What format is the page content exported in?
Answer: Pages are fetched using the body.storage expansion, which returns the Confluence storage representation (HTML). The content is uploaded to Unique with MIME type text/html.
Are Confluence labels preserved during ingestion?
Answer: Yes. All labels on a page are included as metadata during ingestion, except for the two connector labels (ai-ingest and ai-ingest-all by default), which are filtered out. The remaining labels are sorted alphabetically for deterministic ordering.
Which spaces are scanned?
Answer: Only global spaces are scanned (Cloud also includes collaboration spaces). Personal spaces are excluded on both platforms. See the Configuration Guide for full details on space type filtering per instance type.
Authentication
What authentication methods are supported?
Answer: The connector supports OAuth 2.0 (2LO) for Confluence Cloud and Data Center (10.1+), which is the recommended authentication method. Personal Access Token (PAT) is supported only on Data Center versions below 10.1 where OAuth 2.0 (2LO) is not available, and is not recommended. Cloud instances support only OAuth 2.0 (2LO). See the Authentication Guide for full details on each method, credential setup, and token flows.
How are secrets managed in configuration?
Answer: Secret values in tenant YAML configuration files use the os.environ/VARIABLE_NAME syntax to reference environment variables, resolved at startup. See Authentication -- Secret Resolution for the full mechanism, supported fields, and Kubernetes integration.
Configuration
How are tenants configured?
Answer: Each tenant is configured via a YAML file following the naming convention <tenant-name>-tenant-config.yaml. The TENANT_CONFIG_PATH_PATTERN environment variable specifies a glob pattern to locate these files. Tenant names must match the pattern ^[a-z0-9]+(-[a-z0-9]+)*$ and must be unique across all config files. See the Configuration Guide for full details.
What are the available tenant statuses?
Answer:
Status | Behavior |
|---|---|
| Tenant is loaded and sync is scheduled (default if not specified) |
| Tenant config is validated but the tenant is not loaded |
| Ingested content is deleted from the Unique knowledge base and sync is stopped |
At least one tenant must have active or deleted status for the connector to start.
What are the key configuration sections?
Answer: Each tenant YAML file contains four top-level sections:
Section | Purpose |
|---|---|
| Instance type, base URL, authentication, API rate limit, label names |
| Unique API endpoints, authentication mode, rate limit |
| Concurrency, cron schedule, optional scan limit |
| Ingestion mode, scope ID, attachment settings, v1 key format toggle |
See the Configuration Guide for all available options and their defaults.
What are the default values for key settings?
Answer:
Setting | Default Value |
|---|---|
Processing concurrency | 1 |
Scan interval cron |
|
Unique API rate limit | 100 requests/minute |
Attachment ingestion | Enabled |
Maximum attachment size | 200 MB |
Store internally | Enabled |
Use v1 key format | Disabled |
What attachment MIME types are allowed by default?
Answer: The defaults cover PDF (application/pdf), the major Office formats (DOCX, XLSX, PPTX), plain text, CSV, HTML, PNG, and JPEG. These can be overridden via ingestion.attachments.allowedMimeTypes (case-insensitive). The connector matches against the mediaType reported by the Confluence API rather than the filename extension, so renamed files are caught correctly. See Configuration -- Attachment Configuration for the full list.
Are embedded images ingested?
Answer: Yes. Images embedded in a Confluence page (drag/drop, paste, or "Insert image") are stored by Confluence as regular page attachments, so they flow through the same path as PDFs and Office files. With attachment ingestion enabled, PNG and JPEG images are ingested out of the box because both MIME types are in the default allowedMimeTypes list. The connector also requests OCR-based processing for each image (jpgReadMode = DOC_INTELLIGENCE_DEFAULT) by default, so chunks are produced without further scope-side configuration; this can be turned off via attachments.imageOcr = disabled. Other image formats (GIF, WebP, SVG, HEIC, BMP, TIFF) are not currently supported by the Unique ingestion service. Images inserted as external URLs (rather than uploaded) are not attachments and are not ingested.
How do I find my Atlassian Cloud ID?
Answer: The Cloud ID is required only for Confluence Cloud instances. You can find it by visiting:
https://your-domain.atlassian.net/_edge/tenant_infoThe response contains a cloudId field with the UUID.
Sync Behavior
What happens during a sync cycle?
Answer: Each sync cycle follows these steps:
Grant the service account access to the pre-existing root scope in Unique and resolve its path. On the first sync cycle the connector also marks the scope as owned by this tenant's Confluence instance; subsequent cycles verify that mark. The root scope must be created by an administrator before the connector can use it.
Discover all pages matching the configured labels via CQL search
Fetch descendant pages for any pages with the all-descendants label
Extract allowed attachments from discovered pages
Compute a file diff per space against Unique's stored state
Create child scopes in Unique for each space (using the space key as scope name)
Fetch and ingest new or updated pages (HTML storage representation)
Download and ingest new or updated attachments (streamed)
Delete items from Unique that are no longer discovered
Detect space scopes whose Confluence space is no longer discovered, and remove their files and scopes
How does change detection work?
Answer: The connector uses a server-side file diff mechanism that compares discovered items per space against the state stored in Unique, returning which items are new, updated, deleted, or moved. Only new and updated items are fetched and ingested. See the file diff mechanism documentation for the full details including item attributes, partial key format, and diagrams.
What happens when a label is removed from a page?
Answer: If the ai-ingest label is removed from a page (and the page is not also covered by an ancestor's ai-ingest-all label), the page is no longer discovered during the scan. The file diff detects the page as missing and it is deleted from the Unique knowledge base on the next sync cycle. The same applies to any attachments on that page.
If the ai-ingest-all label is removed from a parent page, all descendant pages that were previously discovered solely through that label are no longer found. They are deleted from Unique on the next sync cycle, unless they carry their own ai-ingest label or are descendants of another ai-ingest-all-labeled page.
What happens when a page is deleted from Confluence?
Answer: If the page's space is still discovered during the next sync cycle, the file diff detects the missing page and deletes the corresponding content (page and its attachments) from Unique.
If an entire previously synced space disappears from discovery results (for example, because all its labels were removed or the space was deleted), the connector detects the orphaned space scope at the end of the sync cycle and removes both the space's files and the space scope itself. See Removed Space Cleanup for details.
What happens to attachments when their parent page is unlabeled?
Answer: Attachments are discovered as children of labeled pages. If a page is no longer discovered (because its label was removed or the page was deleted), its attachments are also missing from the discovery results and are deleted from Unique via the file diff mechanism.
How are scopes organized in Unique?
Answer: Scopes follow a two-level hierarchy: a root scope configured per tenant, and child scopes automatically created for each Confluence space key. Child scopes inherit access from the root scope. See the Scope Hierarchy for details.
Safety and Deletion
What safety guards does the connector have?
Answer: The connector includes the following safeguards to prevent accidental data loss and misconfiguration:
Zero-submission guard: If discovery returns zero items for a space but the file diff would still delete content, the sync cycle is aborted. This prevents a transient Confluence error or a silent authentication failure from wiping ingested content.
Full-deletion guard: If the file diff would delete every file stored in Unique for a space, the sync cycle is aborted. If the connector determines the deletion is a legitimate full content replacement (rather than a misconfiguration), the sync proceeds with a warning instead of aborting.
Root scope ownership validation: Each root scope is tagged with the Confluence instance that owns it. If a scope was already claimed by a different Confluence instance, the sync for that tenant fails immediately, preventing two tenants from accidentally writing into the same scope.
See the safety checks and root scope ownership validation documentation for full details.
What happens if I reassign the root scope to a different Confluence instance?
Answer: This is not supported. On the first sync cycle, the connector marks the root scope as owned by this tenant's Confluence instance. If the tenant is later reconfigured to point at a different Confluence instance while keeping the same scopeId, the next sync cycle detects the mismatch and aborts with a fatal error.
To move a tenant to a different Confluence instance, create a new root scope in Unique and configure it as the tenant's scopeId. The old scope and its content remain untouched and can be removed manually if no longer needed.
Are concurrent syncs for the same tenant possible?
Answer: No. If a sync cycle is already running for a tenant when the next scheduled cycle triggers, the new cycle is skipped.
Troubleshooting
Why aren't my pages syncing?
Checklist:
Does the page have the
ai-ingestorai-ingest-alllabel? (Check that the label names match your tenant configuration.)Does the service account have access to the page's space? On Data Center, access can optionally be restricted to specific spaces. If space restrictions are configured, pages in excluded spaces are silently excluded from CQL results.
Is the page in a global space? (Cloud: also includes collaboration spaces.)
Is the page a standard page type? (Databases, whiteboards, and embeds are skipped.)
Does the page have a non-empty body? Pages with empty bodies are discovered but skipped during content ingestion.
Is the tenant status set to
activein the YAML config?Check connector logs for errors related to authentication, API rate limits, or Unique API failures.
Why aren't attachments being ingested?
Checklist:
Is attachment ingestion enabled? (
ingestion.attachments.modemust beenabled, which is the default.)Does the attachment's
mediaTypeappear in theallowedMimeTypeslist?Is the file at most the configured
maxFileSizeMb(default: 200 MB)?Is the file size greater than 0 bytes? (Zero-byte attachments are skipped.)
If the attachment is an image (PNG or JPEG) and chunks are missing, check that
attachments.imageOcrisenabled(default). When disabled, the connector defers to the destination scope's owningestionConfig.jpgReadMode, which defaults toNO_INGESTIONand produces zero chunks.Check connector logs for attachment-specific errors.
Why do I see "Aborting to prevent accidental full deletion" errors?
Answer: This means the full-deletion safety guard was triggered. The guard aborts the sync when the file diff would delete every file stored for a space and the connector determines the deletion is not a legitimate content replacement. Possible causes:
A bug in page discovery returned zero results for a space (e.g., Confluence API issue, authentication failure for specific spaces)
The ingestion key format changed (e.g.,
useV1KeyFormatwas toggled), causing the diff to see all existing keys as unrecognized
Resolution:
Check Confluence API connectivity and authentication
Verify that the
useV1KeyFormatsetting has not changed unexpectedlyIf the key format change was intentional, the old content must be cleaned up manually before switching formats
If your intent really was to replace all pages in a space with a completely new set of pages, the connector detects this as a legitimate replacement and proceeds automatically. See Safety Checks for full details.
Why is sync taking too long?
Possible causes:
Large number of labeled pages and descendants
Large attachments being downloaded and uploaded
Low API rate limit configuration
Low processing concurrency
Solutions:
Increase
processing.concurrency(default: 1)Increase
confluence.apiRateLimitPerMinuteif the Confluence instance allows higher throughputIncrease
unique.apiRateLimitPerMinuteif the Unique platform allows higher throughputReview labeled pages and reduce scope if necessary
Adjust
processing.scanIntervalCronto allow more time between cycles
How does the connector handle errors during ingestion?
Answer: Individual item failures are logged and skipped without aborting the entire sync cycle. At the end of each batch, a summary is logged showing how many items succeeded and how many failed.
Multi-Tenancy
Can one connector serve multiple Confluence instances?
Answer: Yes. Each Confluence instance is configured as a separate tenant with its own YAML configuration file. All tenants run within a single connector deployment with independent authentication, API clients, and sync schedules. See Architecture -- Multi-Tenancy Support for details on tenant isolation and per-tenant service instances.
How do I add a new tenant?
Answer: Create a new YAML configuration file following the naming convention <tenant-name>-tenant-config.yaml in the directory matched by the TENANT_CONFIG_PATH_PATTERN environment variable. The connector must be restarted to pick up new tenant configuration files.
Can two tenants use the same scope ID?
Answer: No. Each root scope is tagged with the tenant that owns it. If a second tenant tries to use a scope already claimed by another tenant, the sync fails immediately.
Performance
What are the API rate limits?
Answer: Both Confluence and Unique API rate limits are independently configurable per tenant:
API | Configuration Key | Default |
|---|---|---|
Confluence |
| No default (must be set) |
Unique |
| 100 requests/minute |
Rate limiting is enforced client-side.
What is the initial sync behavior?
Answer: An initial sync is triggered immediately on startup for each active tenant. After that, syncs follow the configured cron schedule.
Permissions
What Confluence permissions does the connector need?
Answer: The connector requires read access to the Confluence instance. OAuth 2.0 (2LO) and PAT credentials grant instance-wide read access — there is no way to scope them down to specific spaces or pages.
OAuth 2.0 (2LO): The OAuth application is configured with read access to the entire Confluence instance.
Personal Access Token (Data Center below 10.1 only; not recommended): The PAT inherits the permissions of the user who created it. Use OAuth 2.0 (2LO) on Data Center 10.1+ instead.
The connector discovers pages via CQL search queries filtered by label. Only pages carrying the configured sync labels are ingested.
What happens if the connector lacks permission to a space?
Answer: Pages in inaccessible spaces are silently excluded from CQL search results. The connector does not receive an error; it simply never discovers those pages.
If a space that was previously accessible becomes inaccessible, its content is cleaned up automatically at the end of the sync cycle. The connector detects the orphaned space scope and removes both its files and its scope. See Removed Space Cleanup for details.
How does Unique platform authentication work?
Answer: The connector supports two modes: cluster_local for in-cluster deployments (using service headers) and external for out-of-cluster deployments (using Zitadel OAuth credentials). See the Authentication Guide for setup details, required YAML fields, and token flows.
Resource Requirements
What are the resource requirements?
Answer: The default Helm chart values specify the following Kubernetes resource settings:
Resource | Value |
|---|---|
CPU request | 1 core |
CPU limit | Not set |
Memory request | 512 Mi |
Memory limit | 1 Gi |
Node.js max heap ( | 1920 MB |
These defaults are suitable for a single-tenant deployment with moderate page counts. For deployments with many tenants, large numbers of labeled pages, or high concurrency settings, consider increasing memory limits accordingly.
Related Documentation
README - Overview, features, and quick summary
Operator Guide - Deployment and operations
Authentication - Confluence and Unique auth setup
Configuration - Tenant config, environment variables, YAML settings
Technical Reference - Architecture, flows, and design decisions
Standard References
Confluence Cloud REST API - Atlassian Confluence Cloud API documentation
Confluence Data Center REST API - Atlassian Confluence Data Center API documentation
Confluence Query Language (CQL) - CQL reference for content search queries