LiteLLM – Infrastructure Setup

4 min read

This page describes the technical setup for running LiteLLM as an AI Gateway for the Unique platform. It is intended for DevOps, platform, and infrastructure teams operating a self-hosted Unique deployment.

For provider approval, data residency, model training, abuse monitoring, content filtering, and other security or compliance decisions, see LiteLLM - AI Gateway.

Overview

LiteLLM runs as an API gateway between Unique platform services and configured AI model providers. The Unique platform sends requests to LiteLLM; LiteLLM routes each request to the configured provider/model based on its model configuration.

In self-hosted deployments, the client is responsible for provisioning LiteLLM, operating the required infrastructure, managing provider credentials, and configuring the models that are allowed for the environment. Security and compliance approval for providers and models must be completed according to LiteLLM - AI Gateway.

Runtime Requirements

Component	Requirement	Notes
PostgreSQL	Required	Used by LiteLLM for configuration, usage logs, and analytics. PostgreSQL 12+ is supported by LiteLLM. Plan at least 10 GB and scale with usage.
Redis	Recommended	Used for distributed caching. Redis 6+ is recommended. Size memory according to cache volume and TTL.
Kubernetes	Required	Run LiteLLM as a Kubernetes workload, ideally with at least two replicas in production.
Outbound network	Required	Allow HTTPS egress from LiteLLM to the selected AI provider APIs.
Ingress or internal service	Required	Unique backend services must be able to reach the LiteLLM API endpoint.

_Reference:_{LiteLLM database documentation}_,_{LiteLLM caching documentation}_.

Recommended Sizing

Resource	Starting Point	Scale Driver
Replicas	2	Availability and request throughput.
CPU	0.5 vCPU per pod	Request volume, streaming traffic, and provider latency.
Memory	1 GiB per pod	Concurrent requests and caching behavior.
PostgreSQL storage	10 GB minimum	Usage logs, analytics retention, and request volume.
Redis memory	1-4 GB	Cache TTL, payload size, and expected cache hit rate.

These values are starting points. Monitor CPU, memory, provider latency, database growth, and Redis cache hit rates after go-live.

Deploy LiteLLM

Use the upstream LiteLLM Helm chart.

Chart: LiteLLM Helm chart
Deployment guide: LiteLLM Helm deployment documentation
Recommended minimum chart version from the source guide: 0.1.748

Provision database credentials, provider credentials, and the LiteLLM proxy master key through your standard secret management system. Do not store provider keys in plain text Helm values.

Connect Unique to LiteLLM

Configure the Unique chat backend service to call the LiteLLM endpoint.

bash

LITELLM_API_KEY=<your-proxy-master-key>
LITELLM_ENDPOINT=http://litellm.litellm-system.svc.cluster.local:4000

The LITELLM_API_KEY value must match the LiteLLM PROXY_MASTER_KEY. Prefer an internal Kubernetes service DNS name for same-cluster deployments. Use an external HTTPS endpoint only when LiteLLM runs outside the cluster or in a separate network boundary.

Configure Providers and Models

LiteLLM routes requests based on the model names defined in its configuration. A request for a given model name is sent only to the matching configured backend model or deployment.

yaml

model_list:
  - model_name: gpt-4o
    litellm_params:
      model: azure/gpt-4o
      api_key: os.environ/AZURE_API_KEY

  - model_name: claude-sonnet
    litellm_params:
      model: anthropic/claude-sonnet-4-5
      api_key: os.environ/ANTHROPIC_API_KEY

  - model_name: gemini-pro
    litellm_params:
      model: gemini/gemini-pro
      api_key: os.environ/GEMINI_API_KEY

Use environment-backed secrets for provider credentials. The os.environ/KEY_NAME syntax references an environment variable inside the LiteLLM container.

yaml

environmentSecrets:
  - litellm-env-secret

yaml

apiVersion: v1
kind: Secret
metadata:
  name: litellm-env-secret
  namespace: litellm
type: Opaque
stringData:
  AZURE_API_KEY: <azure-api-key>
  ANTHROPIC_API_KEY: <anthropic-api-key>
  GEMINI_API_KEY: <gemini-api-key>

Before adding a provider or model, complete the security and compliance checks described in LiteLLM - AI Gateway.

Caching

LiteLLM can use Redis for response caching. This can reduce cost and latency for repeated requests, but it must be enabled intentionally and reviewed for the deployment’s data handling requirements.

Use Redis for distributed caching across multiple LiteLLM replicas.
Configure cache TTL according to expected reuse and data sensitivity.
Monitor cache hit rate, Redis memory usage, and latency.
Be aware that LiteLLM Redis caching is key-scoped by default, not user-scoped.

Any caching decision that affects data retention or sharing behavior must be reviewed against LiteLLM - AI Gateway.

Network and Access

Allow inbound traffic to LiteLLM only from Unique platform services or approved operational access paths.
- Important: Unique leverages LiteLLM solely as egress gateway. There is absolutely no need to expose the LiteLLM anywhere! If you prefer todo so, restrict and govern the audience!
Allow outbound HTTPS traffic only to approved AI provider APIs and required cloud endpoints.
Use TLS for external endpoints.
Restrict access to the LiteLLM admin UI and proxy master key.
Rotate provider credentials and the proxy master key according to your standard credential policy.

For provider, region, and access-control implications beyond the infrastructure layer, see LiteLLM - AI Gateway.

Operations

Back up the PostgreSQL database and test restore procedures.
Version-control Helm values and model configuration, excluding secrets.
Back up or recreate Kubernetes secrets through your approved secret management system.
Monitor request rates, provider errors, LiteLLM pod health, database growth, Redis usage, and outbound network failures.
Define an incident procedure for provider outages, model disablement, credential compromise, and database recovery.

Troubleshooting

Symptom	Check
LiteLLM cannot start	Check PostgreSQL connectivity, database credentials, required environment variables, and Helm values.
Unique cannot reach LiteLLM	Check `LITELLM_ENDPOINT`, service DNS, namespace, network policies, ingress, and TLS configuration.
Authentication fails	Verify that `LITELLM_API_KEY` in the Unique backend matches LiteLLM `PROXY_MASTER_KEY`.
Provider API errors	Validate provider credentials, model names, provider account status, region availability, and outbound HTTPS connectivity.
High latency or cost	Inspect provider latency, request volume, Redis cache hit rate, CPU/memory pressure, and database performance.
Model unavailable	Confirm the model is configured in LiteLLM, enabled at the provider, available in the selected region, and approved according to LiteLLM - AI Gateway.

LiteLLM – Infrastructure Setup

Overview

Runtime Requirements

Recommended Sizing

Deploy LiteLLM

Connect Unique to LiteLLM

Configure Providers and Models

Caching

Network and Access

Operations

Troubleshooting

Reference Links