LiteLLM – Vertex AI Setup
5 min read
This page describes how to connect a self-hosted LiteLLM deployment to Google Cloud Vertex AI. It is intended for DevOps and platform teams that already operate LiteLLM and need to add Vertex AI models such as Gemini, Claude on Vertex AI, or Mistral on Vertex AI.
For provider approval, data residency, model training, abuse monitoring, content filtering, and contractual review, see LiteLLM - AI Gateway.
For the base LiteLLM deployment, database, Redis, networking, and operational setup, see LiteLLM – Infrastructure Setup.
Scope
This guide covers only the Vertex AI-specific setup:
Google Cloud project prerequisites.
Workload Identity Federation from Kubernetes to Google Cloud.
Partner model enablement in Vertex AI Model Garden.
LiteLLM model configuration for Vertex AI models.
Region and endpoint checks for Vertex AI models.
It does not define which providers, models, regions, or retention settings are allowed. Those decisions belong in the security and compliance process described in LiteLLM - AI Gateway.
Prerequisites
A dedicated Google Cloud project for the target environment, for example one project per tenant and stage.
Billing enabled on the Google Cloud project.
Vertex AI API enabled:
aiplatform.googleapis.com.Permissions to create service accounts, IAM bindings, Workload Identity Federation pools, and Workload Identity Federation providers.
A Kubernetes cluster with an OIDC issuer URL.
Outbound HTTPS connectivity from LiteLLM pods to Google Cloud APIs.
A running LiteLLM deployment. See LiteLLM – Infrastructure Setup.
Reference: Workload Identity Federation with Kubernetes.
Authentication Model
Use Google Cloud Workload Identity Federation (WIF) instead of static service account keys. LiteLLM pods receive a projected Kubernetes ServiceAccount token and exchange it for a short-lived Google Cloud access token. This avoids storing long-lived Google Cloud service account keys in Kubernetes.
Component | Purpose |
|---|---|
Google Cloud Service Account | Identity used by LiteLLM to call Vertex AI. |
| Allows the service account to use Vertex AI. |
Workload Identity Pool | Trust boundary for external identities from the Kubernetes cluster. |
Workload Identity Provider | OIDC provider linked to the Kubernetes cluster issuer. |
Kubernetes ServiceAccount | Runtime identity of the LiteLLM pod. |
Credentials ConfigMap | Google external account credentials file mounted into the LiteLLM pod. |
Setup Steps
1. Create Google Cloud IAM Resources
Create or provision these resources in the target Google Cloud project:
Google Cloud Service Account, for example
vertex-ai-workload@<project-id>.iam.gserviceaccount.com.IAM role binding granting
roles/aiplatform.userto that service account.Workload Identity Pool for the Kubernetes cluster.
Workload Identity Provider using the Kubernetes cluster OIDC issuer URL.
IAM impersonation binding allowing the Kubernetes ServiceAccount subject to impersonate the Google Cloud Service Account via
roles/iam.workloadIdentityUser.
Verify the exact Kubernetes subject format expected by your WIF provider. It is commonly based on system:serviceaccount:<namespace>:<service-account>.
Reference: Manage Workload Identity Federation pools and providers.
2. Enable Vertex AI Models
Enable the required models in Vertex AI before adding them to LiteLLM. Google models such as Gemini and partner models such as Anthropic Claude or Mistral may have different availability, terms, and enablement flows.
Open Vertex AI Model Garden in the target Google Cloud project.
Select the model card for the exact model version.
Click Enable.
Accept the model-specific terms where required.
Confirm billing and quota are available for the selected model and region.
Partner model enablement is per Google Cloud project. Repeat it for every project and environment where the model is needed.
Before enabling a model for production use, complete the checks in LiteLLM - AI Gateway.
3. Mount WIF Credentials into LiteLLM
Create a ConfigMap with the Google external account credentials configuration. Mount it into the LiteLLM pod, together with a projected Kubernetes ServiceAccount token.
apiVersion: v1
kind: ConfigMap
metadata:
name: gcp-wif
data:
credentials.json: |
{
"audience": "//iam.googleapis.com/projects/<gcp-project-number>/locations/global/workloadIdentityPools/<pool-id>/providers/<provider-id>",
"credential_source": {
"file": "/var/run/secrets/tokens/gcp-token",
"format": {
"type": "text"
}
},
"service_account_impersonation_url": "https://iamcredentials.googleapis.com/v1/projects/-/serviceAccounts/vertex-ai-workload@<gcp-project-id>.iam.gserviceaccount.com:generateAccessToken",
"subject_token_type": "urn:ietf:params:oauth:token-type:jwt",
"token_url": "https://sts.googleapis.com/v1/token",
"type": "external_account"
}Configure volumes and volume mounts on the LiteLLM deployment:
volumes:
- name: gcp-wif
configMap:
name: gcp-wif
- name: gcp-token
projected:
sources:
- serviceAccountToken:
audience: "//iam.googleapis.com/projects/<gcp-project-number>/locations/global/workloadIdentityPools/<pool-id>/providers/<provider-id>"
expirationSeconds: 3600
path: gcp-token
volumeMounts:
- name: gcp-wif
mountPath: /etc/gcp
readOnly: true
- name: gcp-token
mountPath: /var/run/secrets/tokens
readOnly: trueThe token audience in the projected token must match the audience in credentials.json.
4. Set LiteLLM Environment Variables
Set these environment variables on the LiteLLM deployment:
GOOGLE_APPLICATION_CREDENTIALS=/etc/gcp/credentials.json
VERTEXAI_PROJECT=<gcp-project-id>With these values set globally, individual model entries normally only need the model identifier and Vertex AI location. Use per-model overrides only when a model must use a different Google Cloud project or credentials file.
5. Add Vertex AI Models to LiteLLM
Add Vertex AI model entries to the LiteLLM model list. Use the vertex_ai/ model prefix and set vertex_location explicitly.
model_list:
- model_name: claude-sonnet-vertex
litellm_params:
model: vertex_ai/claude-sonnet-4@20250514
vertex_location: europe-west1
- model_name: gemini-pro-vertex
litellm_params:
model: vertex_ai/gemini-2.5-pro
vertex_location: europe-west6The model_name is the name exposed through LiteLLM. The litellm_params.model value is the actual Vertex AI model identifier used by LiteLLM.
Reference: LiteLLM Vertex AI provider documentation.
Regions and Endpoints
Always verify region support for the exact model version before enabling it. Google and partner models can differ by region, and preview or partner models may have more limited availability.
For strict EU processing, use a European Vertex AI region or the EU multi-region endpoint where supported.
Do not use the global endpoint for workloads that require controlled processing location.
Repeat the region check for every model version and every Google Cloud project.
For the compliance meaning of regions and endpoints, see LiteLLM - AI Gateway.
References: Deployments and endpoints, Gemini Enterprise Agent Platform partner models for MaaS.
Validation
After deployment, validate the setup before exposing the model to users:
Confirm the LiteLLM pod can read
/etc/gcp/credentials.json.Confirm the projected token exists at
/var/run/secrets/tokens/gcp-token.Confirm the Kubernetes ServiceAccount can impersonate the Google Cloud Service Account.
Confirm the target model is enabled in the correct Google Cloud project.
Confirm quota is available for the selected model and region.
Send a test request through LiteLLM using the configured
model_name.
If validation touches production data, provider access, model availability, data residency, or logging behavior, align with LiteLLM - AI Gateway.
Troubleshooting
Issue | Likely Cause | Check |
|---|---|---|
WIF authentication fails | OIDC issuer, audience, or subject mismatch | Compare the Kubernetes issuer URL, projected token audience, WIF provider audience, and IAM impersonation subject. |
Permission denied from Vertex AI | Missing IAM role | Ensure the Google Cloud Service Account has |
Model not found | Wrong model ID or region | Check the exact model identifier and |
Model access denied | Model not enabled or terms not accepted | Open Model Garden in the same Google Cloud project and verify model enablement, terms, billing, and quota. |
Network errors | Egress blocked | Allow outbound HTTPS to required Google APIs, including Vertex AI, IAM Credentials, and Security Token Service endpoints. |
LiteLLM uses wrong project | Global env or per-model override mismatch | Check |