LiteLLM – Vertex AI Setup

5 min read

This page describes how to connect a self-hosted LiteLLM deployment to Google Cloud Vertex AI. It is intended for DevOps and platform teams that already operate LiteLLM and need to add Vertex AI models such as Gemini, Claude on Vertex AI, or Mistral on Vertex AI.

For provider approval, data residency, model training, abuse monitoring, content filtering, and contractual review, see LiteLLM - AI Gateway.

For the base LiteLLM deployment, database, Redis, networking, and operational setup, see LiteLLM – Infrastructure Setup.

Scope

This guide covers only the Vertex AI-specific setup:

  • Google Cloud project prerequisites.

  • Workload Identity Federation from Kubernetes to Google Cloud.

  • Partner model enablement in Vertex AI Model Garden.

  • LiteLLM model configuration for Vertex AI models.

  • Region and endpoint checks for Vertex AI models.

It does not define which providers, models, regions, or retention settings are allowed. Those decisions belong in the security and compliance process described in LiteLLM - AI Gateway.

Prerequisites

  • A dedicated Google Cloud project for the target environment, for example one project per tenant and stage.

  • Billing enabled on the Google Cloud project.

  • Vertex AI API enabled: aiplatform.googleapis.com.

  • Permissions to create service accounts, IAM bindings, Workload Identity Federation pools, and Workload Identity Federation providers.

  • A Kubernetes cluster with an OIDC issuer URL.

  • Outbound HTTPS connectivity from LiteLLM pods to Google Cloud APIs.

  • A running LiteLLM deployment. See LiteLLM – Infrastructure Setup.

Reference: Workload Identity Federation with Kubernetes.

Authentication Model

Use Google Cloud Workload Identity Federation (WIF) instead of static service account keys. LiteLLM pods receive a projected Kubernetes ServiceAccount token and exchange it for a short-lived Google Cloud access token. This avoids storing long-lived Google Cloud service account keys in Kubernetes.

Component

Purpose

Google Cloud Service Account

Identity used by LiteLLM to call Vertex AI.

roles/aiplatform.user

Allows the service account to use Vertex AI.

Workload Identity Pool

Trust boundary for external identities from the Kubernetes cluster.

Workload Identity Provider

OIDC provider linked to the Kubernetes cluster issuer.

Kubernetes ServiceAccount

Runtime identity of the LiteLLM pod.

Credentials ConfigMap

Google external account credentials file mounted into the LiteLLM pod.

Setup Steps

1. Create Google Cloud IAM Resources

Create or provision these resources in the target Google Cloud project:

  1. Google Cloud Service Account, for example vertex-ai-workload@<project-id>.iam.gserviceaccount.com.

  2. IAM role binding granting roles/aiplatform.user to that service account.

  3. Workload Identity Pool for the Kubernetes cluster.

  4. Workload Identity Provider using the Kubernetes cluster OIDC issuer URL.

  5. IAM impersonation binding allowing the Kubernetes ServiceAccount subject to impersonate the Google Cloud Service Account via roles/iam.workloadIdentityUser.

Verify the exact Kubernetes subject format expected by your WIF provider. It is commonly based on system:serviceaccount:<namespace>:<service-account>.

Reference: Manage Workload Identity Federation pools and providers.

2. Enable Vertex AI Models

Enable the required models in Vertex AI before adding them to LiteLLM. Google models such as Gemini and partner models such as Anthropic Claude or Mistral may have different availability, terms, and enablement flows.

  • Open Vertex AI Model Garden in the target Google Cloud project.

  • Select the model card for the exact model version.

  • Click Enable.

  • Accept the model-specific terms where required.

  • Confirm billing and quota are available for the selected model and region.

Partner model enablement is per Google Cloud project. Repeat it for every project and environment where the model is needed.

Before enabling a model for production use, complete the checks in LiteLLM - AI Gateway.

3. Mount WIF Credentials into LiteLLM

Create a ConfigMap with the Google external account credentials configuration. Mount it into the LiteLLM pod, together with a projected Kubernetes ServiceAccount token.

yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: gcp-wif
data:
  credentials.json: |
    {
      "audience": "//iam.googleapis.com/projects/<gcp-project-number>/locations/global/workloadIdentityPools/<pool-id>/providers/<provider-id>",
      "credential_source": {
        "file": "/var/run/secrets/tokens/gcp-token",
        "format": {
          "type": "text"
        }
      },
      "service_account_impersonation_url": "https://iamcredentials.googleapis.com/v1/projects/-/serviceAccounts/vertex-ai-workload@<gcp-project-id>.iam.gserviceaccount.com:generateAccessToken",
      "subject_token_type": "urn:ietf:params:oauth:token-type:jwt",
      "token_url": "https://sts.googleapis.com/v1/token",
      "type": "external_account"
    }

Configure volumes and volume mounts on the LiteLLM deployment:

yaml
volumes:
  - name: gcp-wif
    configMap:
      name: gcp-wif
  - name: gcp-token
    projected:
      sources:
        - serviceAccountToken:
            audience: "//iam.googleapis.com/projects/<gcp-project-number>/locations/global/workloadIdentityPools/<pool-id>/providers/<provider-id>"
            expirationSeconds: 3600
            path: gcp-token

volumeMounts:
  - name: gcp-wif
    mountPath: /etc/gcp
    readOnly: true
  - name: gcp-token
    mountPath: /var/run/secrets/tokens
    readOnly: true

The token audience in the projected token must match the audience in credentials.json.

4. Set LiteLLM Environment Variables

Set these environment variables on the LiteLLM deployment:

bash
GOOGLE_APPLICATION_CREDENTIALS=/etc/gcp/credentials.json
VERTEXAI_PROJECT=<gcp-project-id>

With these values set globally, individual model entries normally only need the model identifier and Vertex AI location. Use per-model overrides only when a model must use a different Google Cloud project or credentials file.

5. Add Vertex AI Models to LiteLLM

Add Vertex AI model entries to the LiteLLM model list. Use the vertex_ai/ model prefix and set vertex_location explicitly.

yaml
model_list:
  - model_name: claude-sonnet-vertex
    litellm_params:
      model: vertex_ai/claude-sonnet-4@20250514
      vertex_location: europe-west1

  - model_name: gemini-pro-vertex
    litellm_params:
      model: vertex_ai/gemini-2.5-pro
      vertex_location: europe-west6

The model_name is the name exposed through LiteLLM. The litellm_params.model value is the actual Vertex AI model identifier used by LiteLLM.

Reference: LiteLLM Vertex AI provider documentation.

Regions and Endpoints

Always verify region support for the exact model version before enabling it. Google and partner models can differ by region, and preview or partner models may have more limited availability.

  • For strict EU processing, use a European Vertex AI region or the EU multi-region endpoint where supported.

  • Do not use the global endpoint for workloads that require controlled processing location.

  • Repeat the region check for every model version and every Google Cloud project.

For the compliance meaning of regions and endpoints, see LiteLLM - AI Gateway.

References: Deployments and endpoints, Gemini Enterprise Agent Platform partner models for MaaS.

Validation

After deployment, validate the setup before exposing the model to users:

  • Confirm the LiteLLM pod can read /etc/gcp/credentials.json.

  • Confirm the projected token exists at /var/run/secrets/tokens/gcp-token.

  • Confirm the Kubernetes ServiceAccount can impersonate the Google Cloud Service Account.

  • Confirm the target model is enabled in the correct Google Cloud project.

  • Confirm quota is available for the selected model and region.

  • Send a test request through LiteLLM using the configured model_name.

If validation touches production data, provider access, model availability, data residency, or logging behavior, align with LiteLLM - AI Gateway.

Troubleshooting

Issue

Likely Cause

Check

WIF authentication fails

OIDC issuer, audience, or subject mismatch

Compare the Kubernetes issuer URL, projected token audience, WIF provider audience, and IAM impersonation subject.

Permission denied from Vertex AI

Missing IAM role

Ensure the Google Cloud Service Account has roles/aiplatform.user in the target project.

Model not found

Wrong model ID or region

Check the exact model identifier and vertex_location against Google’s model availability docs.

Model access denied

Model not enabled or terms not accepted

Open Model Garden in the same Google Cloud project and verify model enablement, terms, billing, and quota.

Network errors

Egress blocked

Allow outbound HTTPS to required Google APIs, including Vertex AI, IAM Credentials, and Security Token Service endpoints.

LiteLLM uses wrong project

Global env or per-model override mismatch

Check VERTEXAI_PROJECT, GOOGLE_APPLICATION_CREDENTIALS, and any per-model vertex_project or vertex_credentials settings.

Last updated