LiteLLM – Infrastructure Setup
4 min read
This page describes the technical setup for running LiteLLM as an AI Gateway for the Unique platform. It is intended for DevOps, platform, and infrastructure teams operating a self-hosted Unique deployment.
For provider approval, data residency, model training, abuse monitoring, content filtering, and other security or compliance decisions, see LiteLLM - AI Gateway.
Overview
LiteLLM runs as an API gateway between Unique platform services and configured AI model providers. The Unique platform sends requests to LiteLLM; LiteLLM routes each request to the configured provider/model based on its model configuration.
In self-hosted deployments, the client is responsible for provisioning LiteLLM, operating the required infrastructure, managing provider credentials, and configuring the models that are allowed for the environment. Security and compliance approval for providers and models must be completed according to LiteLLM - AI Gateway.
Runtime Requirements
Component | Requirement | Notes |
|---|---|---|
PostgreSQL | Required | Used by LiteLLM for configuration, usage logs, and analytics. PostgreSQL 12+ is supported by LiteLLM. Plan at least 10 GB and scale with usage. |
Redis | Recommended | Used for distributed caching. Redis 6+ is recommended. Size memory according to cache volume and TTL. |
Kubernetes | Required | Run LiteLLM as a Kubernetes workload, ideally with at least two replicas in production. |
Outbound network | Required | Allow HTTPS egress from LiteLLM to the selected AI provider APIs. |
Ingress or internal service | Required | Unique backend services must be able to reach the LiteLLM API endpoint. |
Reference: LiteLLM database documentation, LiteLLM caching documentation.
Recommended Sizing
Resource | Starting Point | Scale Driver |
|---|---|---|
Replicas | 2 | Availability and request throughput. |
CPU | 0.5 vCPU per pod | Request volume, streaming traffic, and provider latency. |
Memory | 1 GiB per pod | Concurrent requests and caching behavior. |
PostgreSQL storage | 10 GB minimum | Usage logs, analytics retention, and request volume. |
Redis memory | 1-4 GB | Cache TTL, payload size, and expected cache hit rate. |
These values are starting points. Monitor CPU, memory, provider latency, database growth, and Redis cache hit rates after go-live.
Deploy LiteLLM
Use the upstream LiteLLM Helm chart.
Chart: LiteLLM Helm chart
Deployment guide: LiteLLM Helm deployment documentation
Recommended minimum chart version from the source guide:
0.1.748
Provision database credentials, provider credentials, and the LiteLLM proxy master key through your standard secret management system. Do not store provider keys in plain text Helm values.
Connect Unique to LiteLLM
Configure the Unique chat backend service to call the LiteLLM endpoint.
LITELLM_API_KEY=<your-proxy-master-key>
LITELLM_ENDPOINT=http://litellm.litellm-system.svc.cluster.local:4000The LITELLM_API_KEY value must match the LiteLLM PROXY_MASTER_KEY. Prefer an internal Kubernetes service DNS name for same-cluster deployments. Use an external HTTPS endpoint only when LiteLLM runs outside the cluster or in a separate network boundary.
Configure Providers and Models
LiteLLM routes requests based on the model names defined in its configuration. A request for a given model name is sent only to the matching configured backend model or deployment.
model_list:
- model_name: gpt-4o
litellm_params:
model: azure/gpt-4o
api_key: os.environ/AZURE_API_KEY
- model_name: claude-sonnet
litellm_params:
model: anthropic/claude-sonnet-4-5
api_key: os.environ/ANTHROPIC_API_KEY
- model_name: gemini-pro
litellm_params:
model: gemini/gemini-pro
api_key: os.environ/GEMINI_API_KEYUse environment-backed secrets for provider credentials. The os.environ/KEY_NAME syntax references an environment variable inside the LiteLLM container.
environmentSecrets:
- litellm-env-secretapiVersion: v1
kind: Secret
metadata:
name: litellm-env-secret
namespace: litellm
type: Opaque
stringData:
AZURE_API_KEY: <azure-api-key>
ANTHROPIC_API_KEY: <anthropic-api-key>
GEMINI_API_KEY: <gemini-api-key>Before adding a provider or model, complete the security and compliance checks described in LiteLLM - AI Gateway.
Caching
LiteLLM can use Redis for response caching. This can reduce cost and latency for repeated requests, but it must be enabled intentionally and reviewed for the deployment’s data handling requirements.
Use Redis for distributed caching across multiple LiteLLM replicas.
Configure cache TTL according to expected reuse and data sensitivity.
Monitor cache hit rate, Redis memory usage, and latency.
Be aware that LiteLLM Redis caching is key-scoped by default, not user-scoped.
Any caching decision that affects data retention or sharing behavior must be reviewed against LiteLLM - AI Gateway.
Network and Access
Allow inbound traffic to LiteLLM only from Unique platform services or approved operational access paths.
Important: Unique leverages LiteLLM solely as egress gateway. There is absolutely no need to expose the LiteLLM anywhere! If you prefer todo so, restrict and govern the audience!
Allow outbound HTTPS traffic only to approved AI provider APIs and required cloud endpoints.
Use TLS for external endpoints.
Restrict access to the LiteLLM admin UI and proxy master key.
Rotate provider credentials and the proxy master key according to your standard credential policy.
For provider, region, and access-control implications beyond the infrastructure layer, see LiteLLM - AI Gateway.
Operations
Back up the PostgreSQL database and test restore procedures.
Version-control Helm values and model configuration, excluding secrets.
Back up or recreate Kubernetes secrets through your approved secret management system.
Monitor request rates, provider errors, LiteLLM pod health, database growth, Redis usage, and outbound network failures.
Define an incident procedure for provider outages, model disablement, credential compromise, and database recovery.
Troubleshooting
Symptom | Check |
|---|---|
LiteLLM cannot start | Check PostgreSQL connectivity, database credentials, required environment variables, and Helm values. |
Unique cannot reach LiteLLM | Check |
Authentication fails | Verify that |
Provider API errors | Validate provider credentials, model names, provider account status, region availability, and outbound HTTPS connectivity. |
High latency or cost | Inspect provider latency, request volume, Redis cache hit rate, CPU/memory pressure, and database performance. |
Model unavailable | Confirm the model is configured in LiteLLM, enabled at the provider, available in the selected region, and approved according to LiteLLM - AI Gateway. |