Confluence Connector (OnPrem & Cloud) V1

7 min read

The new confluence connector image is now in the azure container registry available through invitation. This version fixes the breaking changes from confluence v8 to v9+ as well as having cloud capabilities. We highly recommend that you upgrade the connector ASAP.

Solution Overview

Disclaimer: The confluence connector is currently not compatible with MT and only compatible with STs and CMTs.

The Confluence Connector (CC) is a standalone, dockerized NodeJS application that runs on a configurable schedule and synchronizes the Confluence data with the Unique AI service.

The CC uses the Confluence REST API to fetch the data and the Unique Ingestion API to ingest the data into the Unique AI chat.
The Confluence Connector works with both OnPrem and Cloud instances. You can define the instance type in your env variables.

Confluence users can use the label functionality of Confluence to determine which pages should get ingested.

There are two labels to choose from that indicate if a page should be synced with Unique AI:

ai-ingest
This label will sync the labeled page
ai-ingest-all:
This label will sync the labeled page and all its sub-pages (recursively).

Pages that had their label removed will be deleted from the Unique AI with the next sync.

The label names (ai-ingest and ai-ingest-all) can be changed via the env var.

The CC uses a service user from confluence to make API requests. It is recommended that this service user is specifically created for the CC and has the appropriate access rights to pages and spaces. Link to create api tokens for cloud: https://id.atlassian.com/manage-profile/security/api-tokens

The CC uses the following CQL (confluence query) to get the pages that should be synced:

cql=((label="ai-ingest") OR (label="ai-ingest-all")) AND space.type=global AND type != attachment

The CC runs through all the labeled pages twice. One time to find all IDs of the pages that should be ingested and one time to ingest.

First Query Run:
It syncs these files with the file-diff endpoint of unique to determine which files are new and updated (to ingest), which files were deleted, and which files were moved.
Second Query Run:
In the second run, the CC goes through all pages that need to be ingested one by one and ingests them via the Ingestion API

3 layers to ensure what gets ingested:

Only labeled pages, that are of the global type (excludes private spaces), for which the service user has read access with the API token

Confluence storage format and special elements:

The CC ingests page content in Confluence's raw storage format (body.storage.value) rather than the rendered view. This means Confluence-specific XML elements (namespaced ac:, ri:, at: tags) - such as Jira macros, status badges, table-of-contents, and other structured macros - are passed through as-is without being resolved or rendered.

For example, a Jira ticket linked on a page is ingested as the raw macro markup, not the rendered ticket data (status, title, etc.):

<ac:structured-macro ac:name="jira"><ac:parameter ac:name="key">PROJ-123</ac:parameter></ac:structured-macro>

Previously, a stripTags function removed these Confluence-specific elements before ingestion. This was removed so the full storage content is preserved.

Note: body.view is intentionally not used, as it resolves macros and triggers requests to external systems (e.g., Jira API calls).

Docker Image

The CC is available as a docker image, both publicly on the GHCR (version tagged)

docker pull ghcr.io/unique-ag/unique/confluence-connector:2025.45-a9ae5

as well as privately on the ACR (version and latest tagged)

docker pull uniquecr.azurecr.io/confluence-connector:latest

To get access to our private ACR, please send an access request to enterprise-support@unique.ai or discuss with your CS representative.

The Confluence Connector is included in Unique's release cycle.

If you are using custom certs, don't forget to mount it when using docker run:

docker run --env-file .env --rm -it -p 8083:8083 -v $(pwd)/my_custom_ca.cert:/node/my_custom_ca.cert:z confluence-connector

General Recommendations

Use a PAT (Personal Access Token) / API Token for the confluence service user with the necessary access rights for authentication
Use the TEST_MODE=true when running it for the first time to observe the performance, duration, etc.
Use the CC's /sync endpoint to manually trigger a synchronization. Example:
```
GET localhost:8083/sync
```
Use the CRON_SCHEDULE only after the first initial real ingestion is finished. Once a night should suffice in most cases (0 1 * * *).
Be conservative with the CONFLUENCE_TOKENS_PER_MINUTE rate limiter setting to not nuke your OnPremise instance or hit the confluence cloud’s rate limiter.

Requirements

The connector must be able to reach the confluence OnPrem / cloud installation and the Unique AI.
The connector must have a user that has read access to all spaces and pages that should be synced. This can be either through a basic auth (username + password), using a PAT (Personal Access Token) or an API token generated from confluence.
For scoped access tokens, the following scopes are required:
```
read:confluence-content.all
read:confluence-space.summary
read:confluence-content.summary
read:confluence-content.permission
read:hierarchical-content:confluence
read:confluence-user
read:label:confluence
search:confluence
```
The connector must use the user that was provided by Zitadel to authenticate against the Unique Ingestion API
The Confluence OnPrem server version must be 6.13.23 or higher.

ENV Variables for the CC:

Here are example env variables for both onprem and cloud:

PORT=8080
CRON_SCHEDULE=
INGEST_SINGLE_LABEL=ai-ingest
INGEST_ALL_LABEL=ai-ingest-all
TEST_MODE=true
DEBUG_MODE=true

ZITADEL_CLIENT_ID=confluence-onprem@example.com # service user from zitadel
ZITADEL_CLIENT_SECRET=1234512345 # generated for the service user
ZITADEL_OAUTH_TOKEN_URL=https://id.unique-ai.example.com/oauth/v2/token
ZITADEL_PROJECT_ID=123123 # from zitadel
ZITADEL_SERVICE_EXTRA_HEADERS='{"x-zitadel-instance-host": "id.unique-ai.unique.app"}'
UNIQUE_INGESTION_URL=https://gateway.unique-ai.example.com/v1/content
UNIQUE_INGESTION_URL_GRAPHQL=https://gateway.unique-ai.example.com/ingestion/graphql
UNIQUE_SCOPE_ID=scope_123 # chat scope id
UNIQUE_SERVICE_AUTH_MODE="cluster_local"
UNIQUE_SERVICE_EXTRA_HEADERS='{"x-service-id": "confluence-connector", "x-company-id": "xxxxxx", "x-user-id": "xxxxxx"}'

CONFLUENCE_TOKENS_PER_MINUTE=250

# ONPREM
# CONFLUENCE_INSTANCE_TYPE="ONPREM"
# CONFLUENCE_URL=https://confluence.example.com
# CONFLUENCE_PAT=1234567890/0987654321

# CLOUD
# CONFLUENCE_INSTANCE_TYPE="CLOUD"
# CONFLUENCE_URL=https://example-instance.atlassian.net
# CONFLUENCE_CLOUD_ID=12345
# CONFLUENCE_CLOUD_USER="johndoe@example.com" # confluence service user
# CONFLUENCE_CLOUD_TOKEN="abc123"
# UNIQUE_TOKENS_PER_MINUTE=30

To configure the CC, the following env variables are available:

PORT (required)
The port of the CC. Default: 8083

ZITADELCLIENT_ID (required)
The Zitadel service user that has permission to ingest data into Unique AI

ZITADEL_CLIENT_SECRET (required)
The Zitadel service user's access token

CONFLUENCE_TOKENS_PER_MINUTE
Rate limiter for the API requests to Confluence. 1 request = 1 token. Default: 250

CONFLUENCE_URL (required)
The URL to your confluence server. On localhost this is http://localhost:1990/confluence
Important: Include the http / https prefix.

CONFLUENCE_INSTANCE_TYPE(required)

“ONPREM” or “CLOUD”

CONFLUENCE_PAT (required or username/password - onprem only)
Personal Access Token of the Confluence service user. The CC will make the Confluence API requests with this user.

CONFLUENCE_USERNAME (onprem only)
CONFLUENCE_PASSWORD (onprem only)
For testing purposes. On localhost, these are both "admin".

CONFLUENCE_CLOUD_USER (cloud only)
Confluence service with access to the appropriate spaces

CONFLUENCE_CLOUD_TOKEN (cloud only)
Generated API token.

CONFLUENCE_CLOUD_ID (cloud only)
You can get your cloud_id from https://<baseUrl>.atlassian.net/_edge/tenant_info

CRON_SCHEDULE
Defines how often the CC should sync the Confluence data with Unique AI using the cron format: "* * * * *"

UNIQUE_INGESTION_URL (required)
The ingestion endpoint of Unique AI. Example: https://gateway.<baseUrl>/ingestion/v1/content or the local service cluster url, ex: http://node-ingestion.chat.svc.cluster.local:8091/v1/content
Important: Include the http / https prefix.

UNIQUE_SERVICE_AUTH_MODE(external or cluster_local)
Use cluster_local to avoid hairpinning. If you must leave the cluster, use external.

UNIQUE_SERVICE_EXTRA_HEADERS(json)
Provide extra headers in json format, ex. for cluster_local auth mode: '{"x-service-id": "confluence-connector", "x-company-id": "xxxx", "x-user-id": "xxx"}'

INGEST_ALL_LABEL (required)
The confluence label that defines which page and its sub-pages will get ingested (recursively). Default: "ai-ingest-all"

INGEST_SINGLE_LABEL (required)
The confluence label that defines which page will get ingested. Default: "ai-ingest"

TEST_MODE
When test mode is set to true, the CC will run the process without ingesting. Default: false

ZITADEL_OAUTH_TOKEN_URL
The Zitadel endpoint generates a valid token for ingestion. Example: https://id.<baseUrl>/oauth/v2/token
Important: Include the http / https prefix.

ZITADEL_SERVICE_EXTRA_HEADERS(json)
Provide extra headers to zitadel if needed for external auth mode, ex: {"x-zitadel-instance-host": "id.<baseUrl>.unique.app"}'

ZITADEL PROJECT_ID (required)
The Unique AI Project ID from Zitadel from which the service user will generate a token from

UNIQUE_SCOPE_ID
The Knowledge Base scope where the data will be ingested to in Unique AI's. If no scope id is given, the connector will auto-create a scope for each space and ingest the documents in the respective scope. Note: You will need to give users access to the auto-created scope.

DEBUG_MODE
When debug mode is set to true, all outputs are written into the log file. Default: false

Using proxies and custom certs

If you use proxies or custom certs, you have to define the relevant env variables. Example:

NODE_EXTRA_CA_CERTS="/node/my_custom_ca.cert"
HTTPS_PROXY="https://myproxy:8080"
NO_PROXY: "localhost,*.mydomain"

Using helmfiles, you then need to mount the volumes. Example:

extraVolumes:
  - name: custom-ca-cert
    secret:
      secretName: custom-ca-cert
extraVolumeMounts:
  - name: custom-ca-cert
    mountPath: /node/my_custom_ca.cert

Delete and reset ingested files manually

Delete individual files

You can use the following DELETE endpoint:

localhost:8083/content/:contentId

Reset entire scope

If your /sync doesn't automatically delete ingested files, it might be because of wrong configuration during testing and the files being associated to the wrong space / confluence instance / project / etc.

You can use the following DELETE endpoint of the CC to manually trigger a reset which will delete all ingested confluence pages for a given scope id so you can start again with a clean slate:

localhost:8083/reset/:scopeId

It is possible that /reset needs specific parameters to identify the files correctly. For this you can provide it with a partialKey in the body. This might be your confluence URL (same as from env value) or the space prefix (spaceId_spaceKey)

Examples:

{
    "partialKey": "http://localhost:1990/confluence"
}

{
    "partialKey": "3581239_TES" // concatenated spaceId and spaceKey (shortened space name) with a "_" in between
}

Example Helmfiles

Example helmfiles can be found in the release repo: confluence-connector.yaml

Local Setup

Set up the Atlassian Plugin SDK to run a local confluence instance:

Follow this guide: https://developer.atlassian.com/server/framework/atlassian-sdk/set-up-the-atlassian-plugin-sdk-and-build-a-project/

Up until "create a plugin" but no need to do that part.

Or for data-center: https://www.atlassian.com/software/confluence/download-archives

Run Atlassian instance:

The `atlassian-confluence/server` folder contains a tutorial on how to make a macro. We don't care about the macro, just the working server so we can access it locally.

From `atlassian-confluence/server` folder run the command `atlas-run`.

This will take some time on the first run. When done, you should be able to reach your local confluence instance at `localhost:1990/confluence`

Credentials for login locally:

user: admin

password: admin

Here is an example rest api url that gets all pages with `ai-ingest` and `ai-ingest-all` labels and expands them (so it's in the json response):

http://localhost:1990/confluence/rest/api/content/search?cql=(label=%22ai-ingest%22)%20OR%20(label=%22ai-ingest-all%22)&expand=metadata.labels&limit=1&start=0

You can expand empty string properties and they can contain data. Example _expandable.body is generally empty. However, if you add the query parameter &expand=body.view you will see the body.

Run the confluence scanner:

From add-ins/atlassian-confluence run

npm run start

Migration from ghcr to acr

With the switch to acr we improved env variable naming, fixed breaking confluence v9 changes, added cloud support among other things.

As listed in the example section, for many env variables we added better name-spacing (eg. CLIENT_ID => ZITADEL_CLIENT_ID, APP_PORT => PORT, etc.). Make sure you update your env variables accordingly.

The image is now on the unique container registry.