Vertex AI - Self Deployed Models

Deploy and use your own models on Vertex AI through Model Garden or custom endpoints.

Model Garden

tip

All OpenAI compatible models from Vertex Model Garden are supported.

Using Model Garden

Almost all Vertex Model Garden models are OpenAI compatible.

OpenAI Compatible Models
Non-OpenAI Compatible Models

Property	Details
Provider Route	`vertex_ai/openai/{MODEL_ID}`
Vertex Documentation	Model Garden LiteLLM Inference, Vertex Model Garden
Supported Operations	`/chat/completions`, `/embeddings`

SDK
Proxy

from litellm import completion
import os

## set ENV variables
os.environ["VERTEXAI_PROJECT"] = "hardy-device-38811"
os.environ["VERTEXAI_LOCATION"] = "us-central1"

response = completion(
  model="vertex_ai/openai/<your-endpoint-id>", 
  messages=[{ "content": "Hello, how are you?","role": "user"}]
)

1. Add to config

model_list:
    - model_name: llama3-1-8b-instruct
      litellm_params:
        model: vertex_ai/openai/5464397967697903616
        vertex_ai_project: "my-test-project"
        vertex_ai_location: "us-east-1"

2. Start proxy

litellm --config /path/to/config.yaml

# RUNNING at http://0.0.0.0:4000

3. Test it!

curl --location 'http://0.0.0.0:4000/chat/completions' \
      --header 'Authorization: Bearer sk-1234' \
      --header 'Content-Type: application/json' \
      --data '{
            "model": "llama3-1-8b-instruct", # 👈 the 'model_name' in config
            "messages": [
                {
                "role": "user",
                "content": "what llm are you"
                }
            ],
        }'

from litellm import completion
import os

## set ENV variables
os.environ["VERTEXAI_PROJECT"] = "hardy-device-38811"
os.environ["VERTEXAI_LOCATION"] = "us-central1"

response = completion(
  model="vertex_ai/<your-endpoint-id>", 
  messages=[{ "content": "Hello, how are you?","role": "user"}]
)

Gemma Models (Custom Endpoints)

Deploy Gemma models on custom Vertex AI prediction endpoints with OpenAI-compatible format.

Property	Details
Provider Route	`vertex_ai/gemma/{MODEL_NAME}`
Vertex Documentation	Vertex AI Prediction
Required Parameter	`api_base` - Full prediction endpoint URL

Proxy Usage:

1. Add to config.yaml

model_list:
  - model_name: gemma-model
    litellm_params:
      model: vertex_ai/gemma/gemma-3-12b-it-1222199011122
      api_base: https://ENDPOINT.us-central1-PROJECT.prediction.vertexai.goog/v1/projects/PROJECT_ID/locations/us-central1/endpoints/ENDPOINT_ID:predict
      vertex_project: "my-project-id"
      vertex_location: "us-central1"

2. Start proxy

litellm --config /path/to/config.yaml

3. Test it

curl http://0.0.0.0:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-1234" \
  -d '{
    "model": "gemma-model",
    "messages": [{"role": "user", "content": "What is machine learning?"}],
    "max_tokens": 100
  }'

SDK Usage:

from litellm import completion

response = completion(
    model="vertex_ai/gemma/gemma-3-12b-it-1222199011122",
    messages=[{"role": "user", "content": "What is machine learning?"}],
    api_base="https://ENDPOINT.us-central1-PROJECT.prediction.vertexai.goog/v1/projects/PROJECT_ID/locations/us-central1/endpoints/ENDPOINT_ID:predict",
    vertex_project="my-project-id",
    vertex_location="us-central1",
)

MedGemma Models (Custom Endpoints)

Deploy MedGemma models on custom Vertex AI prediction endpoints with OpenAI-compatible format. MedGemma models use the same vertex_ai/gemma/ route.

Property	Details
Provider Route	`vertex_ai/gemma/{MODEL_NAME}`
Vertex Documentation	Vertex AI Prediction
Required Parameter	`api_base` - Full prediction endpoint URL

Proxy Usage:

1. Add to config.yaml

model_list:
  - model_name: medgemma-model
    litellm_params:
      model: vertex_ai/gemma/medgemma-2b-v1
      api_base: https://ENDPOINT.us-central1-PROJECT.prediction.vertexai.goog/v1/projects/PROJECT_ID/locations/us-central1/endpoints/ENDPOINT_ID:predict
      vertex_project: "my-project-id"
      vertex_location: "us-central1"

2. Start proxy

litellm --config /path/to/config.yaml

3. Test it

curl http://0.0.0.0:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-1234" \
  -d '{
    "model": "medgemma-model",
    "messages": [{"role": "user", "content": "What are the symptoms of hypertension?"}],
    "max_tokens": 100
  }'

SDK Usage:

from litellm import completion

response = completion(
    model="vertex_ai/gemma/medgemma-2b-v1",
    messages=[{"role": "user", "content": "What are the symptoms of hypertension?"}],
    api_base="https://ENDPOINT.us-central1-PROJECT.prediction.vertexai.goog/v1/projects/PROJECT_ID/locations/us-central1/endpoints/ENDPOINT_ID:predict",
    vertex_project="my-project-id",
    vertex_location="us-central1",
)

Model Garden​

Using Model Garden​

Gemma Models (Custom Endpoints)​

MedGemma Models (Custom Endpoints)​

Model Garden

Using Model Garden

Gemma Models (Custom Endpoints)

MedGemma Models (Custom Endpoints)