Elasticsearch Infrastructure

8 min read

Description

Elasticsearch (ES) complements the dense retrieval (vector search with Qdrant) in the Unique platform by incorporating the state-of-the-art sparse retrieval technique BM25. Elasticsearch provides superior relevance scoring for key-word based queries and improves the search performance overall. Our platform previously used PostgreSQL's built-in full-text search with n-gram-based similarity matching (pg_trgm extension). While functional, this approach had several limitations for FSI requirements.

Planning

Elasticsearch Deployment Options: Self-hosted vs Managed Services

When using our Elasticsearch-powered feature, you'll need to decide how to deploy the underlying Elasticsearch infrastructure. This choice affects operational overhead, costs, and performance.

Use Your Existing Elasticsearch Deployment

If you already have Elasticsearch running, you can connect our feature directly to your existing cluster. This is often the most cost-effective option since you're leveraging infrastructure you're already maintaining. You'll need Elasticsearch version 7.x or higher with available storage capacity.

Managed Elasticsearch Services

Managed services like Amazon OpenSearch Service, Elastic Cloud, and Google Cloud Elasticsearch offer the fastest deployment path. They provide automatic maintenance, built-in scaling, comprehensive monitoring, and high availability. However, they come with higher ongoing costs and less configuration control.

Self-hosted Elasticsearch

Self-hosting provides complete cost control and customization capabilities while maintaining data sovereignty. The trade-off is significant operational overhead requiring Elasticsearch expertise, plus responsibility for all maintenance, updates, and monitoring.

Our Recommendation

Use your existing Elasticsearch if you already have a running cluster with available capacity. Start with a managed service if you lack Elasticsearch experience. Consider self-hosting for dedicated infrastructure teams, or strict compliance requirements. We explain how to setup and configure the service using the ECK operator in the following.

Budget

Depending on the volume of the documents and the number of users, the dedicated resources vary. For details on sizing see on Sizing

The cost incurred will depend on the pricing model of your cloud provider.

Examples

note

These are rough estimations. The actual costs depends on the usage patterns, deployment regions and pricing variations. Use the following at your own discretion!

Example 1

With 100 users, Elastic cluster with 3 nodes and 5GB data stored, you can expect:

2 CPUs * 3 nodes = 6 CPUs
2.5 Gi Memory * 3 nodes = 7.5 Gi Memory
8 (5 + 3 buffer) GB * 3 nodes = 24 GB Storage

For Azure pricing in Switzerland north this can be one D8s v5 node (370$) and 3 ZRS disks with 8GB capacity (10$) making a total of:
380$ per month

Example 2

With 5000 users, Elastic cluster with 3 nodes and 50GB data stored, the cost impact will be roughly:

4.5 CPUs * 3 nodes = 13.5 CPUs
4.5 Gi Memory * 3 nodes = 13.5 Gi Memory
80 (50 + 30buffer) GB * 3 nodes = 240 GB Storage

For Azure pricing in Switzerland north this can be two D8s v5 nodes (740$) and 3 ZRS disks of 128GiB capacity each (70$) making a total of:
810$ per month

Provisioning

Pre-requisites

Make sure that you familiarize yourself with Elasticsearch compatibility matrix for the operating systems and your OS is supported.

Make sure that your Kubernetes version is compatible by consulting the following compability matrix.

Deployment

If deploying a standalone Elasticsearch cluster and not using a managed service, we recommend using ECK operator.

Helm chart: elastic/eck-operator

Operator (Chart) Version: >= 3.0.0

Elasticsearch version: >= 9.0.2

For simplicity of deployment we recommend setting node.store.allow_mmap: false

For resource allocation recommendations, consult Sizing

Environment variables and secrets

Name

Component

Description

Example value

Default

Required

ELASTICSEARCH_CA_BUNDLE

node-ingestion

Path to the Elastic CA cert. When using ECK this can be mounted into the pods from secret called <<es-deployment-name>>-es-http-certs-public

/etc/ssl/elastic/ca.crt

““

Only if using private CA/self signed certificate

ELASTICSEARCH_URL

node-ingestion

When using ECK, by default the value is `https://<<es-deployment-name>>-es-http.chat.svc:9200`

https://elasticsearch-ingestion-es-http.chat.svc:9200

““

Yes, if FEATURE_FLAG_ENABLE_ELASTICSEARCH_INDEXING=true

ELASTICSEARCH_USERNAME

node-ingestion

ES user name

elastic

““

Yes

ELASTICSEARCH_PASSWORD

node-ingestion

ES password

If using the ECK operator this value will be automatically set and available in the Kubernetes secret <<es-deployment-name>>-es-elastic-user.

password

““

Yes

FEATURE_FLAG_ENABLE_ELASTICSEARCH_INDEXING

node-ingestion

should be enabled ("true") when the ES cluster is up and running

"false"

"false"

See description

FEATURE_FLAG_ENABLE_SEARCH_ADAPTER_ELASTICSEARCH_UN_9883

node-ingestion

should be enabled ("true") only after successful initial indexing, see below

"false"

"false"

See description

ELASTICSEARCH_MAX_RETRIES

node-ingestion

Sets the maximum number of retry attempts for failed requests

5

No

ELASTICSEARCH_REQUEST_TIMEOUT

node-ingestion

Sets the timeout in milliseconds for individual requests

60000

No

ELASTICSEARCH_SNIFF_ON_START

node-ingestion

Controls whether the client should discover other cluster nodes on initialization. See what is sniffing

“false”

No

Real-life deployment example

Step 1 - Deploying Elasticsearch cluster

https://github.com/Unique-AG/hello-azure/commit/79d0fda466633d38d1f2f4dbce5bbaed8da94efd

Step 2 - Enabling Elastic indexing

https://github.com/Unique-AG/hello-azure/commit/f3ccf0ec2d0c424a1ea2b0fdca9e164fdfa721c7

Step 3 - Triggering Reindexing using the service user

bash
ACCESS_TOKEN=$(curl -s --location 'https://id.hello.azure.unique.dev/oauth/v2/token' \
--header 'Content-Type: application/x-www-form-urlencoded' \
-u "es-ingestion:${ES_SERVICE_USER_CLIENT_ID}" \
--data-urlencode 'grant_type=client_credentials' \
--data-urlencode "scope=openid profile email urn:zitadel:iam:user:resourceowner urn:zitadel:iam:org:projects:roles urn:zitadel:iam:org:project:id:{$PROJECT_ID}:aud" | jq -r .access_token)

curl --location --request POST 'https://api.hello.azure.unique.dev/ingestion/v1/maintenance/rebuild-full-elasticsearch-indexes' --header "Authorization: Bearer $ACCESS_TOKEN 

Step 4 - Enabling search with Elastic

https://github.com/Unique-AG/hello-azure/commit/2240f7a9a69e3f48db5e9cee26fa86354f25826f

Connectivity remarks

  • Currently it is only possible to connect to the Elasticsearch cluster using basic authentication (by passing ELASTICSEARCH_USERNAME and ELASTICSEARCH_PASSWORD in node-ingestion service)

  • The authenticated user should have the ability to create indices

  • For fine-tuning the connection settings, see the following environment variables: ELASTICSEARCH_MAX_RETRIES,ELASTICSEARCH_REQUEST_TIMEOUT, ELASTICSEARCH_SNIFF_ON_START. For more details on those, see Environment variable and secrets section. Upgrades

Running 3-nodes cluster allows in principal for no-downtime upgrades. For details consult Elasticsearch guide. Make sure to cross-check operator compatibility with the Elasticsearch version you want to deploy.

Sizing

Compute / memory

We recommend a cluster consisting of 3 nodes for the High Availability.

In terms of recommended resources, the recommended allocations per node are as follows:

Concurrent Users

Node CPU / Memory

10

200-500m / 2-2.5Gi

100

2000-2500m / 2.5-3Gi

1000

3500-4000m / 3.5-4Gi

4000

4000-4500m / 4-4.5Gi

Volume storage

Elasticsearch stores apart from the index, also the Chunk text (in contrast to Qdrant). The storage volume should therefore be estimate based on the Chunk table size of unique-ingestion-development database + the approximated upload file volume.

The storage should be SSD-based, zone-redundant and support volume expansion.

Initial indexing

Prerequisites:

  • ES cluster must be up an running

  • ES cluster is connected node-ingestion service, you should find the following entry in the logs when the service was started [ElasticSearchService] Elasticsearch: Successfully connected. If the service was not successfully connected, an error will be logged instead.

  • FEATURE_FLAG_ENABLE_ELASTICSEARCH_INDEXING should be set to “true”

With the ES cluster up and running and connected to the node-ingestion, the feature flag enabled, and successful connection verification in the logs, it’s now time to re-index the existing chunk data. While new chunk data is already being indexed into ES, we need to re-index the existing chunks using the Elasticsearch Reindexing Job.

The Elasticsearch Reindexing Job is defined as a Kubernetes CronJob. The schedule is set to an invalid date, 31st of February, to prevent automatic execution and it should be triggered manually when needed – ideally only once, after setting up the Elasticsearch cluster.

Job Definition Setup

Requirements

Setting up the reindexing job requires deploying the ingestion service backend-service chart in version >=4.5.0

The ingestion service (AKA node-ingestion, bs-ingestion, backend-service-ingestion) will be referred as INGESTION_SERVICE_NAME in the code snippets and this is the the ingestion service helm release name.

The namespace where ingestion service is deployed will be referred to as NAMESPACE.

Setup

In the deployment of ingestion service add the following to your existing Helm values:

yaml
extraCronJobs:
  elasticsearch-indexing:
    schedule: "* * 31 2 *"
    restartPolicy: OnFailure
    env:
      RUNNING_MODE: elasticsearch-indexing

Option 1: Trigger from Argo CD UI

You can trigger the CronJob directly from the Argo CD web interface.

Steps

  1. Open Argo CD UI and go to the Application managing the job (e.g., node-ingestion).

  2. In the resource tree, find and click on:

    yaml
    CronJob/<<INGESTION_SERVICE_NAME>>-elasticsearch-indexing
  3. Click the "Create Job" button.

  4. Confirm to launch a one-time job from the CronJob spec.

Option 2: Trigger via kubectl

Use this method to trigger the job from the command line.

Steps

  1. Create a one-time job from the CronJob:

    bash
    kubectl create job \
      --from=cronjob/<<INGESTION_SERVICE_NAME>>-elasticsearch-indexing \
      <<INGESTION_SERVICE_NAME>>-elasticsearch-manual-$(date +%s) \
      -n <<NAMESPACE>>
  2. Check job status:

    bash
    kubectl get jobs -n <<NAMESPACE>>

Elasticsearch Indexing Performance Configuration

Overview

This document describes key environment variables used to control Elasticsearch indexing performance in the node-ingestion service. These settings are read via configMap/.env and allow tuning of batch behavior and concurrency.


Indexing Configuration Variables

Variable

Default

Description

ELASTICSEARCH_INDEXING_BULK_BATCH_SIZE

2000

Number of documents submitted in a single Elasticsearch bulk request.

ELASTICSEARCH_INDEXING_ENTRIES_PER_BATCH

2000

Number of data entries grouped into one batch before preparing for ingest.

ELASTICSEARCH_INDEXING_COMPANY_CONCURRENCY_LIMIT

1

Maximum number of companies processed in parallel during indexing.

Notes

  • Bulk Batch Size and Entries Per Batch allow independent tuning for memory usage vs throughput.

  • Company Concurrency Limit helps prevent resource exhaustion when multiple companies are being indexed simultaneously.

  • All settings are adjustable through environment variables or config maps.

Operating & troubleshooting

Authentication methods

ES authentication is done via username and password, see env variables above.

Troubleshooting

  • How can I verify that the node-ingestion service is connected to the ES cluster?

    • You should find [ElasticSearchService] Elasticsearch: Successfully connected in the logs of the node-ingestion service when the service was started. Otherwise, an error will be displayed.

  • How can service outages and indexing failures be monitored / noticed?

    • The ingestion of files will fail when an ES outage occurs and be visible in the UI.

  • How can we resolve failures during indexing?

    • Use the curl command described in Step 2.

  • When should the memory / volume be increased?

    • Latest when 80% of the memory / volume has been reached.

  • How long does the initial indexing take?

    • The timeline for re-indexing depends on the volume of Chunk data in the database and may take multiple hours to complete.

  • Will there be any downtime for updating / upgrading versions?

    • No downtime should be expected as the ECK operator will perform a rolling update one node at a time.

Architecture overview

System context

The diagram illustrates the system context for Elasticsearch (ES), highlighting the high-level interactions between a human end user (or Custom App) and the underlying software systems. ES is used in two distinct use cases:

  1. First context is Unique AI Ingestion: The end user (or a Custom App) clicks in the Kowledge Base or Chat UI to upload a file. The file is uploaded via secure HTTPS to the File Storage and the Ingestion (extraction of the text and the creation of Chunks) is triggered in the Unique AI software system. Once the Chunks have been created, they are sent together with the metadata to ES to be indexed using the BM25 index and Qdrant as vector embeddings.

  2. Second context is Unique AI Hybrid Search: The end user (or a Custom App) types a query in the agent chat UI (or submits a search request via the public API) and triggers a search request via Unique AI software system. The search request (including relevant metadata) is then forwarded to ES and Qdrant which will match and return relevant Chunks.

Diagram: Untitled Diagram-1749198250698

Container overview

The container diagram for Unique AI Ingestion depicts how a file uploaded by an end user is processed to make it available in Search.

Diagram: Untitled Diagram-1749200802750
Last updated