Confluence Connector - FAQ

14 min read

General

What type of connector is this?

Answer: The Confluence Connector is a pull-based synchronization service that periodically scans Confluence spaces for labeled pages and syncs their content and attachments to the Unique knowledge base.

Key characteristics:

Runs on a configurable cron schedule (default: every 15 minutes)
Pulls content from Confluence via REST API
Requires explicit labeling of pages to trigger synchronization
Operates as a background service without user interaction
Supports both Confluence Cloud and Confluence Data Center

How does this differ from the Confluence Connector v1?

Answer:

Aspect	v1	v2
Multi-tenancy	Not supported	Multiple Confluence instances in a single connector pod
Attachment ingestion	Not supported	Supported with configurable MIME types and size limits; embedded images are inlined into their page artifact
Change detection	File-diff mechanism (pages only)	File-diff mechanism (pages and attachments)
Safety guards	None	Full-deletion prevention, concurrent sync prevention
Key format	`spaceId_spaceKey/pageId`	`tenantName/spaceId_spaceKey/pageId` (v1 format available via `useV1KeyFormat`)

Labels and Page Discovery

How does the connector decide which pages to sync?

Answer: The connector uses two configurable Confluence labels: one for single-page sync (e.g. ai-ingest) and one for syncing a page and all its descendants (e.g. ai-ingest-all). Both labels must be explicitly set in the tenant configuration. See the README for a full overview of label-driven discovery and the technical flows documentation for the detailed CQL-based discovery process.

What happens when a page has both labels?

Answer: The page is deduplicated. The connector merges all labeled pages and their descendants into a single unique set (by page ID), so no page is ingested twice.

What happens when a page has `ai-ingest` and its ancestor has `ai-ingest-all`?

Answer: The page is discovered through both paths but deduplicated by ID, so it is ingested exactly once. See Descendant Discovery for details on how deduplication works.

Which Confluence content types are synced?

Answer: The connector ingests page and blogpost content (including Live Docs, which are a page subtype). Attachments are ingested conditionally when attachments.mode=enabled. Content types database, whiteboard, and embed are explicitly skipped because their APIs expose no renderable body. Folders have no body and are effectively skipped by the empty-body filter. When a skipped content type (such as a database) carries the all-descendants label, its child pages are still discovered and ingested; only the skipped parent itself is excluded. See the Content Type Ingestion Map for the full Cloud and Data Center breakdown.

What format is the page content exported in?

Answer: Pages are fetched using the body.storage expansion, which returns the Confluence storage representation (HTML). The content is uploaded to Unique with MIME type text/html.

Are Confluence labels preserved during ingestion?

Answer: Yes. All labels on a page are included as metadata during ingestion, except for the two connector labels (ai-ingest and ai-ingest-all by default), which are filtered out. The remaining labels are sorted alphabetically for deterministic ordering.

Which spaces are scanned?

Answer: Only global spaces are scanned (Cloud also includes collaboration spaces). Personal spaces are excluded on both platforms. See the Configuration Guide for full details on space type filtering per instance type.

Authentication

What authentication methods are supported?

Answer: The connector supports OAuth 2.0 (2LO) for Confluence Cloud and Data Center (10.1+), which is the recommended authentication method. Personal Access Token (PAT) is supported only on Data Center versions below 10.1 where OAuth 2.0 (2LO) is not available, and is not recommended. Cloud instances support only OAuth 2.0 (2LO). See the Authentication Guide for full details on each method, credential setup, and token flows.

How are secrets managed in configuration?

Answer: Secret values in tenant YAML configuration files use the os.environ/VARIABLE_NAME syntax to reference environment variables, resolved at startup. See Authentication -- Secret Resolution for the full mechanism, supported fields, and Kubernetes integration.

Configuration

How are tenants configured?

Answer: Each tenant is configured via a YAML file following the naming convention <tenant-name>-tenant-config.yaml. The TENANT_CONFIG_PATH_PATTERN environment variable specifies a glob pattern to locate these files. Tenant names must match the pattern ^[a-z0-9]+(-[a-z0-9]+)*$ and must be unique across all config files. See the Configuration Guide for full details.

What are the available tenant statuses?

Answer:

Status	Behavior
`active`	Tenant is loaded and sync is scheduled (default if not specified)
`inactive`	Tenant config is validated but the tenant is not loaded
`deleted`	Ingested content is deleted from the Unique knowledge base and sync is stopped

At least one tenant must have active or deleted status for the connector to start.

What are the key configuration sections?

Answer: Each tenant YAML file contains four top-level sections:

Section	Purpose
`confluence`	Instance type, base URL, authentication, API rate limit, label names
`unique`	Unique API endpoints, authentication mode, rate limit
`processing`	Concurrency, cron schedule, optional scan limit
`ingestion`	Ingestion mode, scope ID, attachment settings, v1 key format toggle

See the Configuration Guide for all available options and their defaults.

What are the default values for key settings?

Answer:

Setting	Default Value
Processing concurrency	1
Scan interval cron	`/15 * * *` (every 15 minutes)
Unique API rate limit	100 requests/minute
Attachment ingestion	Enabled
Maximum attachment size	200 MB
Store internally	Enabled
Use v1 key format	Disabled

What attachment MIME types are allowed by default?

Answer: The defaults cover PDF (application/pdf), the major Office formats (DOCX, XLSX, PPTX), plain text, CSV, HTML, PNG, and JPEG. These can be overridden via ingestion.attachments.allowedMimeTypes (case-insensitive). The connector matches against the mediaType reported by the Confluence API rather than the filename extension, so renamed files are caught correctly. See Configuration -- Attachment Configuration for the full list.

Are embedded images ingested?

Answer: Yes. Each referenced PNG or JPEG attachment is inlined into the page HTML as a base64 data URI, so a page with images becomes a single self-contained ingestion artifact rather than one page plus N separate image artifacts. Images attached to another page in the same Confluence instance are resolved through the same path. Images inserted as external URLs are left in the HTML as-is and are never fetched by the connector.

What happens if an image cannot be inlined into its page?

Answer: When inlining is enabled, image attachments are not ingested as standalone artifacts. Orphan images (attached to the page but not referenced by an in-body macro) are appended to the end of the page body, so their content is still inlined. A macro-referenced image that cannot be inlined (download failure, larger than attachments.maxFileSizeMb, a MIME type not in allowedMimeTypes, or a cross-page reference whose target page or filename cannot be resolved) keeps its original <ac:image> macro and is not ingested elsewhere. On a transient download failure it is inlined on a later sync once the page re-ingests. Inlining failures are logged at warn level with the page id and the referenced filename. The download itself uses the same rate-limited Confluence client as the standalone path, so a failure here means the attachment is unreachable, not that a separate attempt would have succeeded.

What happens to image attachments that were ingested as separate artifacts before inlining was introduced?

Answer: They are not automatically removed. The file-diff mechanism only deletes content whose discovery key disappears from Confluence, and these artifacts still correspond to real attachments on real pages. The first sync after upgrading therefore produces an enriched page artifact alongside the pre-existing standalone image artifact, leaving duplicates in the destination scope. Operators who want a clean state should bulk-delete image attachments from the destination scope manually before or after the upgrade. Subsequent syncs will not re-create them.

How do I verify that page image inlining is working after a deployment?

Answer: Pick a Confluence page known to contain an inline image, run a sync, and inspect the ingested page artifact in the destination scope. The page body should contain data:image/png;base64, or data:image/jpeg;base64, and no <ac:image> macros. The attachment ingestion summary in the connector logs should report a count that excludes the inlined images. If the same image appears both inside a page artifact and as a separate attachment artifact in the same sync, double-check that attachments.mode is enabled in the tenant config.

How do I find my Atlassian Cloud ID?

Answer: The Cloud ID is required only for Confluence Cloud instances. You can find it by visiting:

none

https://your-domain.atlassian.net/_edge/tenant_info

The response contains a cloudId field with the UUID.

Sync Behavior

What happens during a sync cycle?

Answer: Each sync cycle follows these steps:

Grant the service account access to the pre-existing root scope in Unique and resolve its path. On the first sync cycle the connector also marks the scope as owned by this tenant's Confluence instance; subsequent cycles verify that mark. The root scope must be created by an administrator before the connector can use it.
Discover all pages matching the configured labels via CQL search
Fetch descendant pages for any pages with the all-descendants label
Extract allowed attachments from discovered pages
Compute a file diff per space against Unique's stored state
Create child scopes in Unique for each space (using the space key as scope name)
Fetch and ingest new or updated pages (HTML storage representation)
Download and ingest new or updated attachments (streamed)
Delete items from Unique that are no longer discovered
Detect space scopes whose Confluence space is no longer discovered, and remove their files and scopes

How does change detection work?

Answer: The connector uses a server-side file diff mechanism that compares discovered items per space against the state stored in Unique, returning which items are new, updated, deleted, or moved. Only new and updated items are fetched and ingested. See the file diff mechanism documentation for the full details including item attributes, partial key format, and diagrams.

What happens when a label is removed from a page?

Answer: If the ai-ingest label is removed from a page (and the page is not also covered by an ancestor's ai-ingest-all label), the page is no longer discovered during the scan. The file diff detects the page as missing and it is deleted from the Unique knowledge base on the next sync cycle. The same applies to any attachments on that page.

If the ai-ingest-all label is removed from a parent page, all descendant pages that were previously discovered solely through that label are no longer found. They are deleted from Unique on the next sync cycle, unless they carry their own ai-ingest label or are descendants of another ai-ingest-all-labeled page.

What happens when a page is deleted from Confluence?

Answer: If the page's space is still discovered during the next sync cycle, the file diff detects the missing page and deletes the corresponding content (page and its attachments) from Unique.

If an entire previously synced space disappears from discovery results (for example, because all its labels were removed or the space was deleted), the connector detects the orphaned space scope at the end of the sync cycle and removes both the space's files and the space scope itself. See Removed Space Cleanup for details.

What happens to attachments when their parent page is unlabeled?

Answer: Attachments are discovered as children of labeled pages. If a page is no longer discovered (because its label was removed or the page was deleted), its attachments are also missing from the discovery results and are deleted from Unique via the file diff mechanism.

How are scopes organized in Unique?

Answer: Scopes follow a two-level hierarchy: a root scope configured per tenant, and child scopes automatically created for each Confluence space key. Child scopes inherit access from the root scope. See the Scope Hierarchy for details.

Safety and Deletion

What safety guards does the connector have?

Answer: The connector includes the following safeguards to prevent accidental data loss and misconfiguration:

Zero-submission guard: If discovery returns zero items for a space but the file diff would still delete content, the sync cycle is aborted. This prevents a transient Confluence error or a silent authentication failure from wiping ingested content.
Full-deletion guard: If the file diff would delete every file stored in Unique for a space, the sync cycle is aborted. If the connector determines the deletion is a legitimate full content replacement (rather than a misconfiguration), the sync proceeds with a warning instead of aborting.
Root scope ownership validation: Each root scope is tagged with the Confluence instance that owns it. If a scope was already claimed by a different Confluence instance, the sync for that tenant fails immediately, preventing two tenants from accidentally writing into the same scope.

See the safety checks and root scope ownership validation documentation for full details.

What happens if I reassign the root scope to a different Confluence instance?

Answer: This is not supported. On the first sync cycle, the connector marks the root scope as owned by this tenant's Confluence instance. If the tenant is later reconfigured to point at a different Confluence instance while keeping the same scopeId, the next sync cycle detects the mismatch and aborts with a fatal error.

To move a tenant to a different Confluence instance, create a new root scope in Unique and configure it as the tenant's scopeId. The old scope and its content remain untouched and can be removed manually if no longer needed.

Are concurrent syncs for the same tenant possible?

Answer: No. If a sync cycle is already running for a tenant when the next scheduled cycle triggers, the new cycle is skipped.

Troubleshooting

Why aren't my pages syncing?

Checklist:

Does the page have the ai-ingest or ai-ingest-all label? (Check that the label names match your tenant configuration.)
Does the service account have access to the page's space? On Data Center, access can optionally be restricted to specific spaces. If space restrictions are configured, pages in excluded spaces are silently excluded from CQL results.
Is the page in a global space? (Cloud: also includes collaboration spaces.)
Is the page a standard page type? (Databases, whiteboards, and embeds are skipped.)
Does the page have a non-empty body? Pages with empty bodies are discovered but skipped during content ingestion.
Is the tenant status set to active in the YAML config?
Check connector logs for errors related to authentication, API rate limits, or Unique API failures.

Why aren't attachments being ingested?

Checklist:

Is attachment ingestion enabled? (ingestion.attachments.mode must be enabled, which is the default.)
Does the attachment's mediaType appear in the allowedMimeTypes list?
Is the file at most the configured maxFileSizeMb (default: 200 MB)?
Is the file size greater than 0 bytes? (Zero-byte attachments are skipped.)
If the attachment is an image (PNG or JPEG) and chunks are missing, check that attachments.imageOcr is enabled (default). When disabled, the connector defers to the destination scope's own ingestionConfig.jpgReadMode, which defaults to NO_INGESTION and produces zero chunks.
Check connector logs for attachment-specific errors.

Why do I see "Aborting to prevent accidental full deletion" errors?

Answer: This means the full-deletion safety guard was triggered. The guard aborts the sync when the file diff would delete every file stored for a space and the connector determines the deletion is not a legitimate content replacement. Possible causes:

A bug in page discovery returned zero results for a space (e.g., Confluence API issue, authentication failure for specific spaces)
The ingestion key format changed (e.g., useV1KeyFormat was toggled), causing the diff to see all existing keys as unrecognized

Resolution:

Check Confluence API connectivity and authentication
Verify that the useV1KeyFormat setting has not changed unexpectedly
If the key format change was intentional, the old content must be cleaned up manually before switching formats

If your intent really was to replace all pages in a space with a completely new set of pages, the connector detects this as a legitimate replacement and proceeds automatically. See Safety Checks for full details.

Why is sync taking too long?

Possible causes:

Large number of labeled pages and descendants
Large attachments being downloaded and uploaded
Low API rate limit configuration
Low processing concurrency

Solutions:

Increase processing.concurrency (default: 1)
Increase confluence.apiRateLimitPerMinute if the Confluence instance allows higher throughput
Increase unique.apiRateLimitPerMinute if the Unique platform allows higher throughput
Review labeled pages and reduce scope if necessary
Adjust processing.scanIntervalCron to allow more time between cycles

How does the connector handle errors during ingestion?

Answer: Individual item failures are logged and skipped without aborting the entire sync cycle. At the end of each batch, a summary is logged showing how many items succeeded and how many failed.

Multi-Tenancy

Can one connector serve multiple Confluence instances?

Answer: Yes. Each Confluence instance is configured as a separate tenant with its own YAML configuration file. All tenants run within a single connector deployment with independent authentication, API clients, and sync schedules. See Architecture -- Multi-Tenancy Support for details on tenant isolation and per-tenant service instances.

How do I add a new tenant?

Answer: Create a new YAML configuration file following the naming convention <tenant-name>-tenant-config.yaml in the directory matched by the TENANT_CONFIG_PATH_PATTERN environment variable. The connector must be restarted to pick up new tenant configuration files.

Can two tenants use the same scope ID?

Answer: No. Each root scope is tagged with the tenant that owns it. If a second tenant tries to use a scope already claimed by another tenant, the sync fails immediately.

Performance

What are the API rate limits?

Answer: Both Confluence and Unique API rate limits are independently configurable per tenant:

API	Configuration Key	Default
Confluence	`confluence.apiRateLimitPerMinute`	No default (must be set)
Unique	`unique.apiRateLimitPerMinute`	100 requests/minute

Rate limiting is enforced client-side.

What is the initial sync behavior?

Answer: An initial sync is triggered immediately on startup for each active tenant. After that, syncs follow the configured cron schedule.

Permissions

What Confluence permissions does the connector need?

Answer: The connector requires read access to the Confluence instance. OAuth 2.0 (2LO) and PAT credentials grant instance-wide read access — there is no way to scope them down to specific spaces or pages.

OAuth 2.0 (2LO): The OAuth application is configured with read access to the entire Confluence instance.
Personal Access Token (Data Center below 10.1 only; not recommended): The PAT inherits the permissions of the user who created it. Use OAuth 2.0 (2LO) on Data Center 10.1+ instead.

The connector discovers pages via CQL search queries filtered by label. Only pages carrying the configured sync labels are ingested.

What happens if the connector lacks permission to a space?

Answer: Pages in inaccessible spaces are silently excluded from CQL search results. The connector does not receive an error; it simply never discovers those pages.

If a space that was previously accessible becomes inaccessible, its content is cleaned up automatically at the end of the sync cycle. The connector detects the orphaned space scope and removes both its files and its scope. See Removed Space Cleanup for details.

How does Unique platform authentication work?

Answer: The connector supports two modes: cluster_local for in-cluster deployments (using service headers) and external for out-of-cluster deployments (using Zitadel OAuth credentials). See the Authentication Guide for setup details, required YAML fields, and token flows.

Resource Requirements

What are the resource requirements?

Answer: The default Helm chart values specify the following Kubernetes resource settings:

Resource	Value
CPU request	1 core
CPU limit	Not set
Memory request	512 Mi
Memory limit	1 Gi
Node.js max heap (`MAX_HEAP_MB`)	1920 MB

These defaults are suitable for a single-tenant deployment with moderate page counts. For deployments with many tenants, large numbers of labeled pages, or high concurrency settings, consider increasing memory limits accordingly.

README - Overview, features, and quick summary
Operator Guide - Deployment and operations
Authentication - Confluence and Unique auth setup
Configuration - Tenant config, environment variables, YAML settings
Technical Reference - Architecture, flows, and design decisions

Standard References

Confluence Cloud REST API - Atlassian Confluence Cloud API documentation
Confluence Data Center REST API - Atlassian Confluence Data Center API documentation
Confluence Query Language (CQL) - CQL reference for content search queries

Confluence Connector - FAQ

General

What type of connector is this?

How does this differ from the Confluence Connector v1?

Labels and Page Discovery

How does the connector decide which pages to sync?

What happens when a page has both labels?

What happens when a page has ai-ingest and its ancestor has ai-ingest-all?

Which Confluence content types are synced?

What format is the page content exported in?

Are Confluence labels preserved during ingestion?

Which spaces are scanned?

Authentication

What authentication methods are supported?

How are secrets managed in configuration?

Configuration

How are tenants configured?

What are the available tenant statuses?

What are the key configuration sections?

What are the default values for key settings?

What attachment MIME types are allowed by default?

Are embedded images ingested?

What happens if an image cannot be inlined into its page?

What happens to image attachments that were ingested as separate artifacts before inlining was introduced?

How do I verify that page image inlining is working after a deployment?

How do I find my Atlassian Cloud ID?

Sync Behavior

What happens during a sync cycle?

How does change detection work?

What happens when a label is removed from a page?

What happens when a page is deleted from Confluence?

What happens to attachments when their parent page is unlabeled?

How are scopes organized in Unique?

Safety and Deletion

What safety guards does the connector have?

What happens if I reassign the root scope to a different Confluence instance?

Are concurrent syncs for the same tenant possible?

Troubleshooting

Why aren't my pages syncing?

Why aren't attachments being ingested?

Why do I see "Aborting to prevent accidental full deletion" errors?

Why is sync taking too long?

How does the connector handle errors during ingestion?

Multi-Tenancy

Can one connector serve multiple Confluence instances?

How do I add a new tenant?

Can two tenants use the same scope ID?

Performance

What are the API rate limits?

What is the initial sync behavior?

Permissions

What Confluence permissions does the connector need?

What happens if the connector lacks permission to a space?

How does Unique platform authentication work?

Resource Requirements

What are the resource requirements?

Related Documentation

Standard References

What happens when a page has `ai-ingest` and its ancestor has `ai-ingest-all`?