Elasticsearch Infrastructure
8 min read
Description
Elasticsearch (ES) complements the dense retrieval (vector search with Qdrant) in the Unique platform by incorporating the state-of-the-art sparse retrieval technique BM25. Elasticsearch provides superior relevance scoring for key-word based queries and improves the search performance overall. Our platform previously used PostgreSQL's built-in full-text search with n-gram-based similarity matching (pg_trgm extension). While functional, this approach had several limitations for FSI requirements.
Planning
Elasticsearch Deployment Options: Self-hosted vs Managed Services
When using our Elasticsearch-powered feature, you'll need to decide how to deploy the underlying Elasticsearch infrastructure. This choice affects operational overhead, costs, and performance.
Use Your Existing Elasticsearch Deployment
If you already have Elasticsearch running, you can connect our feature directly to your existing cluster. This is often the most cost-effective option since you're leveraging infrastructure you're already maintaining. You'll need Elasticsearch version 7.x or higher with available storage capacity.
Managed Elasticsearch Services
Managed services like Amazon OpenSearch Service, Elastic Cloud, and Google Cloud Elasticsearch offer the fastest deployment path. They provide automatic maintenance, built-in scaling, comprehensive monitoring, and high availability. However, they come with higher ongoing costs and less configuration control.
Self-hosted Elasticsearch
Self-hosting provides complete cost control and customization capabilities while maintaining data sovereignty. The trade-off is significant operational overhead requiring Elasticsearch expertise, plus responsibility for all maintenance, updates, and monitoring.
Our Recommendation
Use your existing Elasticsearch if you already have a running cluster with available capacity. Start with a managed service if you lack Elasticsearch experience. Consider self-hosting for dedicated infrastructure teams, or strict compliance requirements. We explain how to setup and configure the service using the ECK operator in the following.
Budget
Depending on the volume of the documents and the number of users, the dedicated resources vary. For details on sizing see on Sizing
The cost incurred will depend on the pricing model of your cloud provider.
Examples
These are rough estimations. The actual costs depends on the usage patterns, deployment regions and pricing variations. Use the following at your own discretion!
Example 1
With 100 users, Elastic cluster with 3 nodes and 5GB data stored, you can expect:
2 CPUs * 3 nodes = 6 CPUs
2.5 Gi Memory * 3 nodes = 7.5 Gi Memory
8 (5 + 3 buffer) GB * 3 nodes = 24 GB Storage
For Azure pricing in Switzerland north this can be one D8s v5 node (370$) and 3 ZRS disks with 8GB capacity (10$) making a total of:
380$ per month
Example 2
With 5000 users, Elastic cluster with 3 nodes and 50GB data stored, the cost impact will be roughly:
4.5 CPUs * 3 nodes = 13.5 CPUs
4.5 Gi Memory * 3 nodes = 13.5 Gi Memory
80 (50 + 30buffer) GB * 3 nodes = 240 GB Storage
For Azure pricing in Switzerland north this can be two D8s v5 nodes (740$) and 3 ZRS disks of 128GiB capacity each (70$) making a total of:
810$ per month
Provisioning
Pre-requisites
Make sure that you familiarize yourself with Elasticsearch compatibility matrix for the operating systems and your OS is supported.
Make sure that your Kubernetes version is compatible by consulting the following compability matrix.
Deployment
If deploying a standalone Elasticsearch cluster and not using a managed service, we recommend using ECK operator.
Helm chart: elastic/eck-operator
Operator (Chart) Version: >= 3.0.0
Elasticsearch version: >= 9.0.2
For simplicity of deployment we recommend setting node.store.allow_mmap: false
For resource allocation recommendations, consult Sizing
Environment variables and secrets
Name | Component | Description | Example value | Default | Required |
|---|---|---|---|---|---|
| node-ingestion | Path to the Elastic CA cert. When using ECK this can be mounted into the pods from secret called | /etc/ssl/elastic/ca.crt | ““ | Only if using private CA/self signed certificate |
| node-ingestion | When using ECK, by default the value is ` | ““ | Yes, if | |
| node-ingestion | ES user name | elastic | ““ | Yes |
| node-ingestion | ES password If using the ECK operator this value will be automatically set and available in the Kubernetes secret | password | ““ | Yes |
| node-ingestion | should be enabled ("true") when the ES cluster is up and running | "false" | "false" | See description |
| node-ingestion | should be enabled ("true") only after successful initial indexing, see below | "false" | "false" | See description |
| node-ingestion | Sets the maximum number of retry attempts for failed requests | 5 | No | |
| node-ingestion | Sets the timeout in milliseconds for individual requests | 60000 | No | |
| node-ingestion | Controls whether the client should discover other cluster nodes on initialization. See what is sniffing | “false” | No |
Real-life deployment example
Step 1 - Deploying Elasticsearch cluster
https://github.com/Unique-AG/hello-azure/commit/79d0fda466633d38d1f2f4dbce5bbaed8da94efd
Step 2 - Enabling Elastic indexing
https://github.com/Unique-AG/hello-azure/commit/f3ccf0ec2d0c424a1ea2b0fdca9e164fdfa721c7
Step 3 - Triggering Reindexing using the service user
ACCESS_TOKEN=$(curl -s --location 'https://id.hello.azure.unique.dev/oauth/v2/token' \
--header 'Content-Type: application/x-www-form-urlencoded' \
-u "es-ingestion:${ES_SERVICE_USER_CLIENT_ID}" \
--data-urlencode 'grant_type=client_credentials' \
--data-urlencode "scope=openid profile email urn:zitadel:iam:user:resourceowner urn:zitadel:iam:org:projects:roles urn:zitadel:iam:org:project:id:{$PROJECT_ID}:aud" | jq -r .access_token)
curl --location --request POST 'https://api.hello.azure.unique.dev/ingestion/v1/maintenance/rebuild-full-elasticsearch-indexes' --header "Authorization: Bearer $ACCESS_TOKEN Step 4 - Enabling search with Elastic
https://github.com/Unique-AG/hello-azure/commit/2240f7a9a69e3f48db5e9cee26fa86354f25826f
Connectivity remarks
Currently it is only possible to connect to the Elasticsearch cluster using basic authentication (by passing
ELASTICSEARCH_USERNAMEandELASTICSEARCH_PASSWORDin node-ingestion service)The authenticated user should have the ability to create indices
For fine-tuning the connection settings, see the following environment variables:
ELASTICSEARCH_MAX_RETRIES,ELASTICSEARCH_REQUEST_TIMEOUT,ELASTICSEARCH_SNIFF_ON_START. For more details on those, see Environment variable and secrets section. Upgrades
Running 3-nodes cluster allows in principal for no-downtime upgrades. For details consult Elasticsearch guide. Make sure to cross-check operator compatibility with the Elasticsearch version you want to deploy.
Sizing
Compute / memory
We recommend a cluster consisting of 3 nodes for the High Availability.
In terms of recommended resources, the recommended allocations per node are as follows:
Concurrent Users | Node CPU / Memory |
|---|---|
10 | 200-500m / 2-2.5Gi |
100 | 2000-2500m / 2.5-3Gi |
1000 | 3500-4000m / 3.5-4Gi |
4000 | 4000-4500m / 4-4.5Gi |
Volume storage
Elasticsearch stores apart from the index, also the Chunk text (in contrast to Qdrant). The storage volume should therefore be estimate based on the Chunk table size of unique-ingestion-development database + the approximated upload file volume.
The storage should be SSD-based, zone-redundant and support volume expansion.
Initial indexing
Prerequisites:
ES cluster must be up an running
ES cluster is connected
node-ingestionservice, you should find the following entry in the logs when the service was started[ElasticSearchService] Elasticsearch: Successfully connected. If the service was not successfully connected, an error will be logged instead.FEATURE_FLAG_ENABLE_ELASTICSEARCH_INDEXINGshould be set to “true”
With the ES cluster up and running and connected to the node-ingestion, the feature flag enabled, and successful connection verification in the logs, it’s now time to re-index the existing chunk data. While new chunk data is already being indexed into ES, we need to re-index the existing chunks using the Elasticsearch Reindexing Job.
The Elasticsearch Reindexing Job is defined as a Kubernetes CronJob. The schedule is set to an invalid date, 31st of February, to prevent automatic execution and it should be triggered manually when needed – ideally only once, after setting up the Elasticsearch cluster.
Job Definition Setup
Requirements
Setting up the reindexing job requires deploying the ingestion service backend-service chart in version >=4.5.0
The ingestion service (AKA node-ingestion, bs-ingestion, backend-service-ingestion) will be referred as INGESTION_SERVICE_NAME in the code snippets and this is the the ingestion service helm release name.
The namespace where ingestion service is deployed will be referred to as NAMESPACE.
Setup
In the deployment of ingestion service add the following to your existing Helm values:
extraCronJobs:
elasticsearch-indexing:
schedule: "* * 31 2 *"
restartPolicy: OnFailure
env:
RUNNING_MODE: elasticsearch-indexingOption 1: Trigger from Argo CD UI
You can trigger the CronJob directly from the Argo CD web interface.
Steps
Open Argo CD UI and go to the Application managing the job (e.g.,
node-ingestion).In the resource tree, find and click on:
yamlCronJob/<<INGESTION_SERVICE_NAME>>-elasticsearch-indexingClick the "Create Job" button.
Confirm to launch a one-time job from the CronJob spec.
Option 2: Trigger via kubectl
Use this method to trigger the job from the command line.
Steps
Create a one-time job from the CronJob:
bashkubectl create job \ --from=cronjob/<<INGESTION_SERVICE_NAME>>-elasticsearch-indexing \ <<INGESTION_SERVICE_NAME>>-elasticsearch-manual-$(date +%s) \ -n <<NAMESPACE>>Check job status:
bashkubectl get jobs -n <<NAMESPACE>>
Elasticsearch Indexing Performance Configuration
Overview
This document describes key environment variables used to control Elasticsearch indexing performance in the node-ingestion service. These settings are read via configMap/.env and allow tuning of batch behavior and concurrency.
Indexing Configuration Variables
Variable | Default | Description |
|---|---|---|
|
| Number of documents submitted in a single Elasticsearch bulk request. |
|
| Number of data entries grouped into one batch before preparing for ingest. |
|
| Maximum number of companies processed in parallel during indexing. |
Notes
Bulk Batch Size and Entries Per Batch allow independent tuning for memory usage vs throughput.
Company Concurrency Limit helps prevent resource exhaustion when multiple companies are being indexed simultaneously.
All settings are adjustable through environment variables or config maps.
Operating & troubleshooting
Authentication methods
ES authentication is done via username and password, see env variables above.
Troubleshooting
How can I verify that the node-ingestion service is connected to the ES cluster?
You should find
[ElasticSearchService] Elasticsearch: Successfully connectedin the logs of the node-ingestion service when the service was started. Otherwise, an error will be displayed.
How can service outages and indexing failures be monitored / noticed?
The ingestion of files will fail when an ES outage occurs and be visible in the UI.
How can we resolve failures during indexing?
Use the curl command described in Step 2.
When should the memory / volume be increased?
Latest when 80% of the memory / volume has been reached.
How long does the initial indexing take?
The timeline for re-indexing depends on the volume of Chunk data in the database and may take multiple hours to complete.
Will there be any downtime for updating / upgrading versions?
No downtime should be expected as the ECK operator will perform a rolling update one node at a time.
Architecture overview
System context
The diagram illustrates the system context for Elasticsearch (ES), highlighting the high-level interactions between a human end user (or Custom App) and the underlying software systems. ES is used in two distinct use cases:
First context is Unique AI Ingestion: The end user (or a Custom App) clicks in the Kowledge Base or Chat UI to upload a file. The file is uploaded via secure HTTPS to the File Storage and the Ingestion (extraction of the text and the creation of Chunks) is triggered in the Unique AI software system. Once the Chunks have been created, they are sent together with the metadata to ES to be indexed using the BM25 index and Qdrant as vector embeddings.
Second context is Unique AI Hybrid Search: The end user (or a Custom App) types a query in the agent chat UI (or submits a search request via the public API) and triggers a search request via Unique AI software system. The search request (including relevant metadata) is then forwarded to ES and Qdrant which will match and return relevant Chunks.

Container overview
The container diagram for Unique AI Ingestion depicts how a file uploaded by an end user is processed to make it available in Search.
