LiteLLM – Infrastructure Setup

4 min read

This page describes the technical setup for running LiteLLM as an AI Gateway for the Unique platform. It is intended for DevOps, platform, and infrastructure teams operating a self-hosted Unique deployment.

For provider approval, data residency, model training, abuse monitoring, content filtering, and other security or compliance decisions, see LiteLLM - AI Gateway.

Overview

LiteLLM runs as an API gateway between Unique platform services and configured AI model providers. The Unique platform sends requests to LiteLLM; LiteLLM routes each request to the configured provider/model based on its model configuration.

In self-hosted deployments, the client is responsible for provisioning LiteLLM, operating the required infrastructure, managing provider credentials, and configuring the models that are allowed for the environment. Security and compliance approval for providers and models must be completed according to LiteLLM - AI Gateway.

Runtime Requirements

Component

Requirement

Notes

PostgreSQL

Required

Used by LiteLLM for configuration, usage logs, and analytics. PostgreSQL 12+ is supported by LiteLLM. Plan at least 10 GB and scale with usage.

Redis

Recommended

Used for distributed caching. Redis 6+ is recommended. Size memory according to cache volume and TTL.

Kubernetes

Required

Run LiteLLM as a Kubernetes workload, ideally with at least two replicas in production.

Outbound network

Required

Allow HTTPS egress from LiteLLM to the selected AI provider APIs.

Ingress or internal service

Required

Unique backend services must be able to reach the LiteLLM API endpoint.

Reference: LiteLLM database documentation, LiteLLM caching documentation.

Resource

Starting Point

Scale Driver

Replicas

2

Availability and request throughput.

CPU

0.5 vCPU per pod

Request volume, streaming traffic, and provider latency.

Memory

1 GiB per pod

Concurrent requests and caching behavior.

PostgreSQL storage

10 GB minimum

Usage logs, analytics retention, and request volume.

Redis memory

1-4 GB

Cache TTL, payload size, and expected cache hit rate.

These values are starting points. Monitor CPU, memory, provider latency, database growth, and Redis cache hit rates after go-live.

Deploy LiteLLM

Use the upstream LiteLLM Helm chart.

Provision database credentials, provider credentials, and the LiteLLM proxy master key through your standard secret management system. Do not store provider keys in plain text Helm values.

Connect Unique to LiteLLM

Configure the Unique chat backend service to call the LiteLLM endpoint.

bash
LITELLM_API_KEY=<your-proxy-master-key>
LITELLM_ENDPOINT=http://litellm.litellm-system.svc.cluster.local:4000

The LITELLM_API_KEY value must match the LiteLLM PROXY_MASTER_KEY. Prefer an internal Kubernetes service DNS name for same-cluster deployments. Use an external HTTPS endpoint only when LiteLLM runs outside the cluster or in a separate network boundary.

Configure Providers and Models

LiteLLM routes requests based on the model names defined in its configuration. A request for a given model name is sent only to the matching configured backend model or deployment.

yaml
model_list:
  - model_name: gpt-4o
    litellm_params:
      model: azure/gpt-4o
      api_key: os.environ/AZURE_API_KEY

  - model_name: claude-sonnet
    litellm_params:
      model: anthropic/claude-sonnet-4-5
      api_key: os.environ/ANTHROPIC_API_KEY

  - model_name: gemini-pro
    litellm_params:
      model: gemini/gemini-pro
      api_key: os.environ/GEMINI_API_KEY

Use environment-backed secrets for provider credentials. The os.environ/KEY_NAME syntax references an environment variable inside the LiteLLM container.

yaml
environmentSecrets:
  - litellm-env-secret
yaml
apiVersion: v1
kind: Secret
metadata:
  name: litellm-env-secret
  namespace: litellm
type: Opaque
stringData:
  AZURE_API_KEY: <azure-api-key>
  ANTHROPIC_API_KEY: <anthropic-api-key>
  GEMINI_API_KEY: <gemini-api-key>

Before adding a provider or model, complete the security and compliance checks described in LiteLLM - AI Gateway.

Caching

LiteLLM can use Redis for response caching. This can reduce cost and latency for repeated requests, but it must be enabled intentionally and reviewed for the deployment’s data handling requirements.

  • Use Redis for distributed caching across multiple LiteLLM replicas.

  • Configure cache TTL according to expected reuse and data sensitivity.

  • Monitor cache hit rate, Redis memory usage, and latency.

  • Be aware that LiteLLM Redis caching is key-scoped by default, not user-scoped.

Any caching decision that affects data retention or sharing behavior must be reviewed against LiteLLM - AI Gateway.

Network and Access

  • Allow inbound traffic to LiteLLM only from Unique platform services or approved operational access paths.

    • Important: Unique leverages LiteLLM solely as egress gateway. There is absolutely no need to expose the LiteLLM anywhere! If you prefer todo so, restrict and govern the audience!

  • Allow outbound HTTPS traffic only to approved AI provider APIs and required cloud endpoints.

  • Use TLS for external endpoints.

  • Restrict access to the LiteLLM admin UI and proxy master key.

  • Rotate provider credentials and the proxy master key according to your standard credential policy.

For provider, region, and access-control implications beyond the infrastructure layer, see LiteLLM - AI Gateway.

Operations

  • Back up the PostgreSQL database and test restore procedures.

  • Version-control Helm values and model configuration, excluding secrets.

  • Back up or recreate Kubernetes secrets through your approved secret management system.

  • Monitor request rates, provider errors, LiteLLM pod health, database growth, Redis usage, and outbound network failures.

  • Define an incident procedure for provider outages, model disablement, credential compromise, and database recovery.

Troubleshooting

Symptom

Check

LiteLLM cannot start

Check PostgreSQL connectivity, database credentials, required environment variables, and Helm values.

Unique cannot reach LiteLLM

Check LITELLM_ENDPOINT, service DNS, namespace, network policies, ingress, and TLS configuration.

Authentication fails

Verify that LITELLM_API_KEY in the Unique backend matches LiteLLM PROXY_MASTER_KEY.

Provider API errors

Validate provider credentials, model names, provider account status, region availability, and outbound HTTPS connectivity.

High latency or cost

Inspect provider latency, request volume, Redis cache hit rate, CPU/memory pressure, and database performance.

Model unavailable

Confirm the model is configured in LiteLLM, enabled at the provider, available in the selected region, and approved according to LiteLLM - AI Gateway.

Last updated