Master LiteLLM with our step-by-step tutorial, detailed feature walkthrough, and expert tips.
Install LiteLLM via pip (pip install litellm) or pull the Docker image (docker pull ghcr.io/berriai/litellm:main
latest) for the proxy server Create a config.yaml file defining your LLM providers and API keys — see docs.litellm.ai/docs/proxy/docker_quick_start for templates Start the proxy server with 'litellm
config config.yaml' and verify it is running at http://localhost:4000 Point your existing OpenAI SDK client to the LiteLLM proxy URL (base_url='http://localhost:4000') and test with a completion request Set up virtual keys and budget limits for your team using the /key/generate API endpoint to control access and spending
💡 Quick Start: Follow these 3 steps in order to get up and running with LiteLLM quickly.
Explore the key features that make LiteLLM powerful for deployment & hosting workflows.
LiteLLM provides a single OpenAI-compatible endpoint that routes to 100+ LLM providers including OpenAI, Anthropic, Google, AWS Bedrock, Azure, Cohere, and Mistral. Applications can switch providers by changing a model name parameter rather than rewriting each provider integration. Supported capabilities vary by provider and model. Source: https://docs.litellm.ai/ and https://models.litellm.ai/.
Distributes requests across multiple providers and deployment regions using configurable routing strategies. When a provider returns errors or hits rate limits, requests can cascade to backup models with retry behavior and backoff settings. This is useful for teams that need production applications to continue operating when a single provider is unavailable or constrained. Source: https://docs.litellm.ai/.
Calculates LLM costs from token usage and provider pricing data where supported. Spend can be attributed to API keys, users, teams, and organizations, and teams can configure budget limits to control usage. LiteLLM also supports tag-based attribution and export workflows for teams that need reporting outside the proxy. Source: https://docs.litellm.ai/docs/proxy/budget_manager.
Enterprise options add capabilities such as JWT-based authentication, SSO integration, audit logging, support, and custom service-level terms according to LiteLLM's public feature and AI gateway pages. Self-hosted deployment can help organizations keep the gateway layer within their own infrastructure, though teams still need to review provider data handling and compliance requirements. Sources: https://www.litellm.ai/features and https://www.litellm.ai/ai-gateway.
Native integrations with Langfuse, Arize Phoenix, Langsmith, and OpenTelemetry provide visibility into model performance, latency, errors, and cost trends. Prometheus metrics enable Grafana dashboard integration for alerting on spend thresholds, error spikes, and latency degradation. Sources: https://docs.litellm.ai/docs/proxy/observability and https://www.litellm.ai/features.
Create virtual API keys for individual developers or teams, each with configurable budget limits, rate limits such as RPM and TPM, and model access permissions. This centralizes API key management so platform teams can control which models teams access without distributing raw provider credentials broadly. Source: https://docs.litellm.ai/docs/proxy/virtual_keys.
Yes. LiteLLM is available as a Python package (pip install litellm) that you can use as a library in your code or run as a standalone proxy server. Docker is recommended for production deployments but not required.
LiteLLM adds a gateway hop between your application and model provider. Actual latency depends on deployment location, logging configuration, routing rules, provider latency, and network conditions, so teams should benchmark it in their own environment before production rollout.
Direct provider SDKs can be simpler for a single provider. LiteLLM is more useful when teams need automatic failover, unified spend tracking, budget enforcement, and the ability to switch or combine providers behind an OpenAI-compatible interface.
LiteLLM can be self-hosted so the gateway runs inside your own infrastructure. However, model requests still go to the configured model providers unless routed to local models, so teams should review both LiteLLM deployment settings and each provider's data handling policies.
LiteLLM supports 100+ providers including OpenAI, Anthropic Claude, Google Gemini, AWS Bedrock, Azure OpenAI, Cohere, Mistral, Together AI, Replicate, Hugging Face, Ollama for local models, and many more.
Yes. LiteLLM supports routing to local model servers including Ollama, vLLM, and OpenAI-compatible endpoints. This allows teams to mix cloud and local models in the same routing configuration with unified logging and spend tracking.
Now that you know how to use LiteLLM, it's time to put this knowledge into practice.
Sign up and follow the tutorial steps
Check pros, cons, and user feedback
See how it stacks against alternatives
Follow our tutorial and master this powerful deployment & hosting tool in minutes.
Tutorial updated March 2026