Confluence Connector (OnPrem & Cloud) V1
7 min read
The new confluence connector image is now in the azure container registry available through invitation. This version fixes the breaking changes from confluence v8 to v9+ as well as having cloud capabilities. We highly recommend that you upgrade the connector ASAP.
Solution Overview
Disclaimer: The confluence connector is currently not compatible with MT and only compatible with STs and CMTs.
The Confluence Connector (CC) is a standalone, dockerized NodeJS application that runs on a configurable schedule and synchronizes the Confluence data with the Unique AI service.
The CC uses the Confluence REST API to fetch the data and the Unique Ingestion API to ingest the data into the Unique AI chat.
The Confluence Connector works with both OnPrem and Cloud instances. You can define the instance type in your env variables.
Confluence users can use the label functionality of Confluence to determine which pages should get ingested.
There are two labels to choose from that indicate if a page should be synced with Unique AI:
ai-ingest
This label will sync the labeled pageai-ingest-all:
This label will sync the labeled page and all its sub-pages (recursively).
Pages that had their label removed will be deleted from the Unique AI with the next sync.
The label names (ai-ingest and ai-ingest-all) can be changed via the env var.
The CC uses a service user from confluence to make API requests. It is recommended that this service user is specifically created for the CC and has the appropriate access rights to pages and spaces. Link to create api tokens for cloud: https://id.atlassian.com/manage-profile/security/api-tokens
The CC uses the following CQL (confluence query) to get the pages that should be synced:
cql=((label="ai-ingest") OR (label="ai-ingest-all")) AND space.type=global AND type != attachmentThe CC runs through all the labeled pages twice. One time to find all IDs of the pages that should be ingested and one time to ingest.
First Query Run:
It syncs these files with the file-diff endpoint of unique to determine which files are new and updated (to ingest), which files were deleted, and which files were moved.Second Query Run:
In the second run, the CC goes through all pages that need to be ingested one by one and ingests them via the Ingestion API
3 layers to ensure what gets ingested:
Only labeled pages, that are of the global type (excludes private spaces), for which the service user has read access with the API token
Confluence storage format and special elements:
The CC ingests page content in Confluence's raw storage format (body.storage.value) rather than the rendered view. This means Confluence-specific XML elements (namespaced ac:, ri:, at: tags) - such as Jira macros, status badges, table-of-contents, and other structured macros - are passed through as-is without being resolved or rendered.
For example, a Jira ticket linked on a page is ingested as the raw macro markup, not the rendered ticket data (status, title, etc.):
<ac:structured-macro ac:name="jira"><ac:parameter ac:name="key">PROJ-123</ac:parameter></ac:structured-macro>Previously, a stripTags function removed these Confluence-specific elements before ingestion. This was removed so the full storage content is preserved.
Note: body.view is intentionally not used, as it resolves macros and triggers requests to external systems (e.g., Jira API calls).
Docker Image
The CC is available as a docker image, both publicly on the GHCR (version tagged)
docker pull ghcr.io/unique-ag/unique/confluence-connector:2025.45-a9ae5as well as privately on the ACR (version and latest tagged)
docker pull uniquecr.azurecr.io/confluence-connector:latestTo get access to our private ACR, please send an access request to enterprise-support@unique.ai or discuss with your CS representative.
The Confluence Connector is included in Unique's release cycle.
If you are using custom certs, don't forget to mount it when using docker run:
docker run --env-file .env --rm -it -p 8083:8083 -v $(pwd)/my_custom_ca.cert:/node/my_custom_ca.cert:z confluence-connector General Recommendations
Use a PAT (Personal Access Token) / API Token for the confluence service user with the necessary access rights for authentication
Use the TEST_MODE=true when running it for the first time to observe the performance, duration, etc.
Use the CC's
/syncendpoint to manually trigger a synchronization. Example:GET localhost:8083/syncUse the
CRON_SCHEDULEonly after the first initial real ingestion is finished. Once a night should suffice in most cases (0 1 * * *).Be conservative with the
CONFLUENCE_TOKENS_PER_MINUTErate limiter setting to not nuke your OnPremise instance or hit the confluence cloud’s rate limiter.
Requirements
The connector must be able to reach the confluence OnPrem / cloud installation and the Unique AI.
The connector must have a user that has read access to all spaces and pages that should be synced. This can be either through a basic auth (username + password), using a PAT (Personal Access Token) or an API token generated from confluence.
For scoped access tokens, the following scopes are required:read:confluence-content.all read:confluence-space.summary read:confluence-content.summary read:confluence-content.permission read:hierarchical-content:confluence read:confluence-user read:label:confluence search:confluenceThe connector must use the user that was provided by Zitadel to authenticate against the Unique Ingestion API
The Confluence OnPrem server version must be 6.13.23 or higher.
ENV Variables for the CC:
Here are example env variables for both onprem and cloud:
PORT=8080
CRON_SCHEDULE=
INGEST_SINGLE_LABEL=ai-ingest
INGEST_ALL_LABEL=ai-ingest-all
TEST_MODE=true
DEBUG_MODE=true
ZITADEL_CLIENT_ID=confluence-onprem@example.com # service user from zitadel
ZITADEL_CLIENT_SECRET=1234512345 # generated for the service user
ZITADEL_OAUTH_TOKEN_URL=https://id.unique-ai.example.com/oauth/v2/token
ZITADEL_PROJECT_ID=123123 # from zitadel
ZITADEL_SERVICE_EXTRA_HEADERS='{"x-zitadel-instance-host": "id.unique-ai.unique.app"}'
UNIQUE_INGESTION_URL=https://gateway.unique-ai.example.com/v1/content
UNIQUE_INGESTION_URL_GRAPHQL=https://gateway.unique-ai.example.com/ingestion/graphql
UNIQUE_SCOPE_ID=scope_123 # chat scope id
UNIQUE_SERVICE_AUTH_MODE="cluster_local"
UNIQUE_SERVICE_EXTRA_HEADERS='{"x-service-id": "confluence-connector", "x-company-id": "xxxxxx", "x-user-id": "xxxxxx"}'
CONFLUENCE_TOKENS_PER_MINUTE=250
# ONPREM
# CONFLUENCE_INSTANCE_TYPE="ONPREM"
# CONFLUENCE_URL=https://confluence.example.com
# CONFLUENCE_PAT=1234567890/0987654321
# CLOUD
# CONFLUENCE_INSTANCE_TYPE="CLOUD"
# CONFLUENCE_URL=https://example-instance.atlassian.net
# CONFLUENCE_CLOUD_ID=12345
# CONFLUENCE_CLOUD_USER="johndoe@example.com" # confluence service user
# CONFLUENCE_CLOUD_TOKEN="abc123"
# UNIQUE_TOKENS_PER_MINUTE=30To configure the CC, the following env variables are available:
PORT (required)
The port of the CC. Default: 8083
ZITADELCLIENT_ID (required)
The Zitadel service user that has permission to ingest data into Unique AI
ZITADEL_CLIENT_SECRET (required)
The Zitadel service user's access token
CONFLUENCE_TOKENS_PER_MINUTE
Rate limiter for the API requests to Confluence. 1 request = 1 token. Default: 250
CONFLUENCE_URL (required)
The URL to your confluence server. On localhost this is http://localhost:1990/confluence
Important: Include the http / https prefix.
CONFLUENCE_INSTANCE_TYPE(required)
“ONPREM” or “CLOUD”
CONFLUENCE_PAT (required or username/password - onprem only)
Personal Access Token of the Confluence service user. The CC will make the Confluence API requests with this user.
CONFLUENCE_USERNAME (onprem only)CONFLUENCE_PASSWORD (onprem only)
For testing purposes. On localhost, these are both "admin".
CONFLUENCE_CLOUD_USER (cloud only)
Confluence service with access to the appropriate spaces
CONFLUENCE_CLOUD_TOKEN (cloud only)
Generated API token.
CONFLUENCE_CLOUD_ID (cloud only)
You can get your cloud_id from https://<baseUrl>.atlassian.net/_edge/tenant_info
CRON_SCHEDULE
Defines how often the CC should sync the Confluence data with Unique AI using the cron format: "* * * * *"
UNIQUE_INGESTION_URL (required)
The ingestion endpoint of Unique AI. Example: https://gateway.<baseUrl>/ingestion/v1/content or the local service cluster url, ex: http://node-ingestion.chat.svc.cluster.local:8091/v1/content
Important: Include the http / https prefix.
UNIQUE_SERVICE_AUTH_MODE(external or cluster_local)
Use cluster_local to avoid hairpinning. If you must leave the cluster, use external.
UNIQUE_SERVICE_EXTRA_HEADERS(json)
Provide extra headers in json format, ex. for cluster_local auth mode: '{"x-service-id": "confluence-connector", "x-company-id": "xxxx", "x-user-id": "xxx"}'
INGEST_ALL_LABEL (required)
The confluence label that defines which page and its sub-pages will get ingested (recursively). Default: "ai-ingest-all"
INGEST_SINGLE_LABEL (required)
The confluence label that defines which page will get ingested. Default: "ai-ingest"
TEST_MODE
When test mode is set to true, the CC will run the process without ingesting. Default: false
ZITADEL_OAUTH_TOKEN_URL
The Zitadel endpoint generates a valid token for ingestion. Example: https://id.<baseUrl>/oauth/v2/token
Important: Include the http / https prefix.
ZITADEL_SERVICE_EXTRA_HEADERS(json)
Provide extra headers to zitadel if needed for external auth mode, ex: {"x-zitadel-instance-host": "id.<baseUrl>.unique.app"}'
ZITADEL PROJECT_ID (required)
The Unique AI Project ID from Zitadel from which the service user will generate a token from
UNIQUE_SCOPE_ID
The Knowledge Base scope where the data will be ingested to in Unique AI's. If no scope id is given, the connector will auto-create a scope for each space and ingest the documents in the respective scope. Note: You will need to give users access to the auto-created scope.
DEBUG_MODE
When debug mode is set to true, all outputs are written into the log file. Default: false
Using proxies and custom certs
If you use proxies or custom certs, you have to define the relevant env variables. Example:
NODE_EXTRA_CA_CERTS="/node/my_custom_ca.cert"HTTPS_PROXY="https://myproxy:8080"NO_PROXY: "localhost,*.mydomain"
Using helmfiles, you then need to mount the volumes. Example:
extraVolumes:
- name: custom-ca-cert
secret:
secretName: custom-ca-cert
extraVolumeMounts:
- name: custom-ca-cert
mountPath: /node/my_custom_ca.certDelete and reset ingested files manually
Delete individual files
You can use the following DELETE endpoint:
localhost:8083/content/:contentIdReset entire scope
If your /sync doesn't automatically delete ingested files, it might be because of wrong configuration during testing and the files being associated to the wrong space / confluence instance / project / etc.
You can use the following DELETE endpoint of the CC to manually trigger a reset which will delete all ingested confluence pages for a given scope id so you can start again with a clean slate:
localhost:8083/reset/:scopeIdIt is possible that /reset needs specific parameters to identify the files correctly. For this you can provide it with a partialKey in the body. This might be your confluence URL (same as from env value) or the space prefix (spaceId_spaceKey)
Examples:
{
"partialKey": "http://localhost:1990/confluence"
}{
"partialKey": "3581239_TES" // concatenated spaceId and spaceKey (shortened space name) with a "_" in between
}
Example Helmfiles
Example helmfiles can be found in the release repo: confluence-connector.yaml
Local Setup
Set up the Atlassian Plugin SDK to run a local confluence instance:
Follow this guide: https://developer.atlassian.com/server/framework/atlassian-sdk/set-up-the-atlassian-plugin-sdk-and-build-a-project/
Up until "create a plugin" but no need to do that part.
Or for data-center: https://www.atlassian.com/software/confluence/download-archives
Run Atlassian instance:
The `atlassian-confluence/server` folder contains a tutorial on how to make a macro. We don't care about the macro, just the working server so we can access it locally.
From `atlassian-confluence/server` folder run the command `atlas-run`.
This will take some time on the first run. When done, you should be able to reach your local confluence instance at `localhost:1990/confluence`
Credentials for login locally:
user: admin password: adminHere is an example rest api url that gets all pages with `ai-ingest` and `ai-ingest-all` labels and expands them (so it's in the json response):
http://localhost:1990/confluence/rest/api/content/search?cql=(label=%22ai-ingest%22)%20OR%20(label=%22ai-ingest-all%22)&expand=metadata.labels&limit=1&start=0
Read more about cql here: https://developer.atlassian.com/server/confluence/advanced-searching-using-cql/
You can expand empty string properties and they can contain data. Example _expandable.body is generally empty. However, if you add the query parameter &expand=body.view you will see the body.
Run the confluence scanner:
From add-ins/atlassian-confluence run
npm run startMigration from ghcr to acr
With the switch to acr we improved env variable naming, fixed breaking confluence v9 changes, added cloud support among other things.
As listed in the example section, for many env variables we added better name-spacing (eg. CLIENT_ID => ZITADEL_CLIENT_ID, APP_PORT => PORT, etc.). Make sure you update your env variables accordingly.
The image is now on the unique container registry.