Elasticsearch Infrastructure

8 min read

Description

Elasticsearch (ES) complements the dense retrieval (vector search with Qdrant) in the Unique platform by incorporating the state-of-the-art sparse retrieval technique BM25. Elasticsearch provides superior relevance scoring for key-word based queries and improves the search performance overall. Our platform previously used PostgreSQL's built-in full-text search with n-gram-based similarity matching (pg_trgm extension). While functional, this approach had several limitations for FSI requirements.

Architecture overview

Planning

Elasticsearch Deployment Options: Self-hosted vs Managed Services

When using our Elasticsearch-powered feature, you'll need to decide how to deploy the underlying Elasticsearch infrastructure. This choice affects operational overhead, costs, and performance.

Use Your Existing Elasticsearch Deployment

If you already have Elasticsearch running, you can connect our feature directly to your existing cluster. This is often the most cost-effective option since you're leveraging infrastructure you're already maintaining. You'll need Elasticsearch version 7.x or higher with available storage capacity.

Managed Elasticsearch Services

Managed services like Amazon OpenSearch Service, Elastic Cloud, and Google Cloud Elasticsearch offer the fastest deployment path. They provide automatic maintenance, built-in scaling, comprehensive monitoring, and high availability. However, they come with higher ongoing costs and less configuration control.

Self-hosted Elasticsearch

Self-hosting provides complete cost control and customization capabilities while maintaining data sovereignty. The trade-off is significant operational overhead requiring Elasticsearch expertise, plus responsibility for all maintenance, updates, and monitoring.

Our Recommendation

Use your existing Elasticsearch if you already have a running cluster with available capacity. Start with a managed service if you lack Elasticsearch experience. Consider self-hosting for dedicated infrastructure teams, or strict compliance requirements. We explain how to setup and configure the service using the ECK operator in the following.

Budget

Depending on the volume of the documents and the number of users, the dedicated resources vary. For details on sizing see on Sizing

The cost incurred will depend on the pricing model of your cloud provider.

Examples

These are rough estimations. The actual costs depends on the usage patterns, deployment regions and pricing variations. Use the following at your own discretion!

Example 1

With 100 users, Elastic cluster with 3 nodes and 5GB data stored, you can expect:

2 CPUs * 3 nodes = 6 CPUs
2.5 Gi Memory * 3 nodes = 7.5 Gi Memory
8 (5 + 3 buffer) GB * 3 nodes = 24 GB Storage

For Azure pricing in Switzerland north this can be one D8s v5 node (370$) and 3 ZRS disks with 8GB capacity (10$) making a total of:
380$ per month

Example 2

With 5000 users, Elastic cluster with 3 nodes and 50GB data stored, the cost impact will be roughly:

4.5 CPUs * 3 nodes = 13.5 CPUs
4.5 Gi Memory * 3 nodes = 13.5 Gi Memory
80 (50 + 30buffer) GB * 3 nodes = 240 GB Storage

For Azure pricing in Switzerland north this can be two D8s v5 nodes (740$) and 3 ZRS disks of 128GiB capacity each (70$) making a total of:
810$ per month

Provisioning

Pre-requisites

Make sure that you familiarize yourself with Elasticsearch compatibility matrix for the operating systems and your OS is supported.

Make sure that your Kubernetes version is compatible by consulting the following compability matrix.

Deployment

If deploying a standalone Elasticsearch cluster and not using a managed service, we recommend using ECK operator.

Helm chart: elastic/eck-operator

Operator (Chart) Version: >= 3.0.0

Elasticsearch version: >= 9.0.2

For simplicity of deployment we recommend setting node.store.allow_mmap: false

For resource allocation recommendations, consult Sizing

Environment variables and secrets

Name	Component	Description	Example value	Default	Required
`ELASTICSEARCH_CA_BUNDLE`	node-ingestion	Path to the Elastic CA cert. When using ECK this can be mounted into the pods from secret called `<<es-deployment-name>>-es-http-certs-public`	/etc/ssl/elastic/ca.crt	““	Only if using private CA/self signed certificate
`ELASTICSEARCH_URL`	node-ingestion	When using ECK, by default the value is ``https://<<es-deployment-name>>-es-http.chat.svc:9200``	https://elasticsearch-ingestion-es-http.chat.svc:9200	““	Yes, if `FEATURE_FLAG_ENABLE_ELASTICSEARCH_INDEXING=true`
`ELASTICSEARCH_USERNAME`	node-ingestion	ES user name	elastic	““	Yes
`ELASTICSEARCH_PASSWORD`	node-ingestion	ES password If using the ECK operator this value will be automatically set and available in the Kubernetes secret `<<es-deployment-name>>-es-elastic-user`.	password	““	Yes
`FEATURE_FLAG_ENABLE_ELASTICSEARCH_INDEXING`	node-ingestion	should be enabled ("true") when the ES cluster is up and running	"false"	"false"	See description
`FEATURE_FLAG_ENABLE_SEARCH_ADAPTER_ELASTICSEARCH_UN_9883`	node-ingestion	should be enabled ("true") only after successful initial indexing, see below	"false"	"false"	See description
`ELASTICSEARCH_MAX_RETRIES`	node-ingestion	Sets the maximum number of retry attempts for failed requests		5	No
`ELASTICSEARCH_REQUEST_TIMEOUT`	node-ingestion	Sets the timeout in milliseconds for individual requests		60000	No
`ELASTICSEARCH_SNIFF_ON_START`	node-ingestion	Controls whether the client should discover other cluster nodes on initialization. See what is sniffing		“false”	No

Real-life deployment example

Step 1 - Deploying Elasticsearch cluster

https://github.com/Unique-AG/hello-azure/commit/79d0fda466633d38d1f2f4dbce5bbaed8da94efd

Step 2 - Enabling Elastic indexing

https://github.com/Unique-AG/hello-azure/commit/f3ccf0ec2d0c424a1ea2b0fdca9e164fdfa721c7

Step 3 - Triggering Reindexing using the service user

bash

ACCESS_TOKEN=$(curl -s --location 'https://id.hello.azure.unique.dev/oauth/v2/token' \
--header 'Content-Type: application/x-www-form-urlencoded' \
-u "es-ingestion:${ES_SERVICE_USER_CLIENT_ID}" \
--data-urlencode 'grant_type=client_credentials' \
--data-urlencode "scope=openid profile email urn:zitadel:iam:user:resourceowner urn:zitadel:iam:org:projects:roles urn:zitadel:iam:org:project:id:{$PROJECT_ID}:aud" | jq -r .access_token)

curl --location --request POST 'https://api.hello.azure.unique.dev/ingestion/v1/maintenance/rebuild-full-elasticsearch-indexes' --header "Authorization: Bearer $ACCESS_TOKEN

Step 4 - Enabling search with Elastic

https://github.com/Unique-AG/hello-azure/commit/2240f7a9a69e3f48db5e9cee26fa86354f25826f

Connectivity remarks

Currently it is only possible to connect to the Elasticsearch cluster using basic authentication (by passing ELASTICSEARCH_USERNAME and ELASTICSEARCH_PASSWORD in node-ingestion service)
The authenticated user should have the ability to create indices
For fine-tuning the connection settings, see the following environment variables: ELASTICSEARCH_MAX_RETRIES,ELASTICSEARCH_REQUEST_TIMEOUT, ELASTICSEARCH_SNIFF_ON_START. For more details on those, see Environment variable and secrets section. Upgrades

Running 3-nodes cluster allows in principal for no-downtime upgrades. For details consult Elasticsearch guide. Make sure to cross-check operator compatibility with the Elasticsearch version you want to deploy.

Sizing

Compute / memory

We recommend a cluster consisting of 3 nodes for the High Availability.

In terms of recommended resources, the recommended allocations per node are as follows:

Concurrent Users	Node CPU / Memory
10	200-500m / 2-2.5Gi
100	2000-2500m / 2.5-3Gi
1000	3500-4000m / 3.5-4Gi
4000	4000-4500m / 4-4.5Gi

Volume storage

Elasticsearch stores apart from the index, also the Chunk text (in contrast to Qdrant). The storage volume should therefore be estimate based on the Chunk table size of unique-ingestion-development database + the approximated upload file volume.

The storage should be SSD-based, zone-redundant and support volume expansion.

Initial indexing

Prerequisites:

ES cluster must be up an running
ES cluster is connected node-ingestion service, you should find the following entry in the logs when the service was started [ElasticSearchService] Elasticsearch: Successfully connected. If the service was not successfully connected, an error will be logged instead.
FEATURE_FLAG_ENABLE_ELASTICSEARCH_INDEXING should be set to “true”

With the ES cluster up and running and connected to the node-ingestion, the feature flag enabled, and successful connection verification in the logs, it’s now time to re-index the existing chunk data. While new chunk data is already being indexed into ES, we need to re-index the existing chunks using the Elasticsearch Reindexing Job.

The Elasticsearch Reindexing Job is defined as a Kubernetes CronJob. The schedule is set to an invalid date, 31st of February, to prevent automatic execution and it should be triggered manually when needed – ideally only once, after setting up the Elasticsearch cluster.

Job Definition Setup

Requirements

Setting up the reindexing job requires deploying the ingestion service backend-service chart in version >=4.5.0

The ingestion service (AKA node-ingestion, bs-ingestion, backend-service-ingestion) will be referred as INGESTION_SERVICE_NAME in the code snippets and this is the the ingestion service helm release name.

The namespace where ingestion service is deployed will be referred to as NAMESPACE.

Setup

In the deployment of ingestion service add the following to your existing Helm values:

yaml

extraCronJobs:
  elasticsearch-indexing:
    schedule: "* * 31 2 *"
    restartPolicy: OnFailure
    env:
      RUNNING_MODE: elasticsearch-indexing

Option 1: Trigger from Argo CD UI

You can trigger the CronJob directly from the Argo CD web interface.

Steps

Open Argo CD UI and go to the Application managing the job (e.g., node-ingestion).

In the resource tree, find and click on:

yaml

CronJob/<<INGESTION_SERVICE_NAME>>-elasticsearch-indexing

Click the "Create Job" button.
Confirm to launch a one-time job from the CronJob spec.

Option 2: Trigger via `kubectl`

Use this method to trigger the job from the command line.

Steps

Create a one-time job from the CronJob:

bash

kubectl create job \
  --from=cronjob/<<INGESTION_SERVICE_NAME>>-elasticsearch-indexing \
  <<INGESTION_SERVICE_NAME>>-elasticsearch-manual-$(date +%s) \
  -n <<NAMESPACE>>

Check job status:
bash
```
kubectl get jobs -n <<NAMESPACE>>
```

Elasticsearch Indexing Performance Configuration

Overview

This document describes key environment variables used to control Elasticsearch indexing performance in the node-ingestion service. These settings are read via configMap/.env and allow tuning of batch behavior and concurrency.

Indexing Configuration Variables

Variable	Default	Description
`ELASTICSEARCH_INDEXING_BULK_BATCH_SIZE`	`2000`	Number of documents submitted in a single Elasticsearch bulk request.
`ELASTICSEARCH_INDEXING_ENTRIES_PER_BATCH`	`2000`	Number of data entries grouped into one batch before preparing for ingest.
`ELASTICSEARCH_INDEXING_COMPANY_CONCURRENCY_LIMIT`	`1`	Maximum number of companies processed in parallel during indexing.

Notes

Bulk Batch Size and Entries Per Batch allow independent tuning for memory usage vs throughput.
Company Concurrency Limit helps prevent resource exhaustion when multiple companies are being indexed simultaneously.
All settings are adjustable through environment variables or config maps.

Operating & troubleshooting

Authentication methods

ES authentication is done via username and password, see env variables above.

Troubleshooting

How can I verify that the node-ingestion service is connected to the ES cluster?
- You should find [ElasticSearchService] Elasticsearch: Successfully connected in the logs of the node-ingestion service when the service was started. Otherwise, an error will be displayed.
How can service outages and indexing failures be monitored / noticed?
- The ingestion of files will fail when an ES outage occurs and be visible in the UI.
How can we resolve failures during indexing?
- Use the curl command described in Step 2.
When should the memory / volume be increased?
- Latest when 80% of the memory / volume has been reached.
How long does the initial indexing take?
- The timeline for re-indexing depends on the volume of Chunk data in the database and may take multiple hours to complete.
Will there be any downtime for updating / upgrading versions?
- No downtime should be expected as the ECK operator will perform a rolling update one node at a time.

Architecture overview

System context

The diagram illustrates the system context for Elasticsearch (ES), highlighting the high-level interactions between a human end user (or Custom App) and the underlying software systems. ES is used in two distinct use cases:

First context is Unique AI Ingestion: The end user (or a Custom App) clicks in the Kowledge Base or Chat UI to upload a file. The file is uploaded via secure HTTPS to the File Storage and the Ingestion (extraction of the text and the creation of Chunks) is triggered in the Unique AI software system. Once the Chunks have been created, they are sent together with the metadata to ES to be indexed using the BM25 index and Qdrant as vector embeddings.
Second context is Unique AI Hybrid Search: The end user (or a Custom App) types a query in the agent chat UI (or submits a search request via the public API) and triggers a search request via Unique AI software system. The search request (including relevant metadata) is then forwarded to ES and Qdrant which will match and return relevant Chunks.

Container overview

The container diagram for Unique AI Ingestion depicts how a file uploaded by an end user is processed to make it available in Search.

Elasticsearch Infrastructure

Description

Planning

Elasticsearch Deployment Options: Self-hosted vs Managed Services

Budget

Provisioning

Pre-requisites

Deployment

Environment variables and secrets

Real-life deployment example

Step 1 - Deploying Elasticsearch cluster

Step 2 - Enabling Elastic indexing

Step 3 - Triggering Reindexing using the service user

Step 4 - Enabling search with Elastic

Connectivity remarks

Sizing

Compute / memory

Volume storage

Initial indexing

Job Definition Setup

Requirements

Setup

Option 1: Trigger from Argo CD UI

Steps

Option 2: Trigger via kubectl

Steps

Elasticsearch Indexing Performance Configuration

Overview

Indexing Configuration Variables

Notes

Operating & troubleshooting

Authentication methods

Troubleshooting

Architecture overview

System context

Container overview

Option 2: Trigger via `kubectl`