Observe and control AI applications with caching, rate limiting, and analytics for any LLM provider.
A control layer for your AI applications — add caching, rate limiting, and cost tracking to any AI provider.
Cloudflare AI Gateway is a Deployment & Hosting proxy service that gives developers unified observability, caching, rate limiting, and failover across any LLM provider with one line of code, available free on all Cloudflare plans. It targets engineering teams running production AI applications who need cost control, reliability, and analytics without rewriting their stack.
The service operates as an intelligent proxy layer between AI applications and model providers, currently supporting 20+ providers including OpenAI, Anthropic, Google AI Studio, Google Vertex AI, Amazon Bedrock, Workers AI, Azure OpenAI, Cohere, DeepSeek, Mistral AI, Groq, Perplexity, Replicate, ElevenLabs, HuggingFace, OpenRouter, xAI, Cerebras, and more. Integration requires only swapping the API endpoint URL — existing authentication and request schemas remain unchanged. Beyond basic proxying, AI Gateway offers a Unified API (OpenAI compat) so a single request format works across providers, plus advanced features in beta like Dynamic Routing with JSON configuration, Data Loss Prevention (DLP), Guardrails for content moderation, BYOK (bring your own keys), and Custom Providers. The WebSockets API beta supports both realtime and non-realtime streaming.
For observability, AI Gateway provides analytics on request volumes, token consumption, costs per provider, and latency, plus full request/response logging, custom metadata tagging, OpenTelemetry export, and Workers Logpush integration. Caching can serve repeat requests directly from Cloudflare's edge for sub-10ms responses, and rate limiting prevents runaway costs from misbehaving clients or agents. Request retry and model fallback automatically reroute traffic during provider outages — particularly valuable for AI agents that depend on consistent uptime.
Based on our analysis of 870+ AI tools, AI Gateway stands out among LLM proxy/observability platforms for its zero-cost entry point and edge-network deployment. Compared to alternatives like Helicone, Langfuse, and LangSmith in our directory, AI Gateway uniquely bundles proxying with Cloudflare's broader infrastructure (Workers AI, Vectorize, R2), making it the natural choice for teams already on Cloudflare. However, it lacks the deep prompt-engineering and evaluation tooling that purpose-built LLMOps platforms provide. Last documentation update: April 20, 2026.
Was this helpful?
Cloudflare AI Gateway provides essential observability and control for production AI applications. The combination of caching, rate limiting, and analytics makes it valuable for any organization running AI at scale.
A single OpenAI-compatible request schema works across all 20+ supported providers, so you can swap models without changing client code. This makes A/B testing, multi-provider routing, and fallback chains dramatically simpler. Combined with the Vercel AI SDK integration, it lets full-stack apps treat heterogeneous models as one interface.
AI Gateway can serve repeat requests directly from Cloudflare's cache with sub-10ms latency, bypassing the origin model provider entirely. This is particularly powerful for deterministic prompts, FAQ-style chatbots, and agent workflows that re-query similar context. Cache policies are configurable per-gateway based on tolerance for response variation.
Automatically retries failed requests and falls back to a configured backup provider/model when the primary errors or times out. This turns provider outages from incidents into transparent failovers for end users. It is especially valuable for autonomous agents that cannot afford a single broken upstream call.
JSON-configurable routing logic that lets you direct traffic between providers based on rules, weights, or conditions — useful for canary rollouts, cost-optimized routing, or tier-based model selection. Configuration is declarative and managed per-gateway. Combined with the Unified API, it enables sophisticated multi-model strategies without application code changes.
DLP scans inbound and outbound payloads to prevent leakage of sensitive data, while Guardrails apply content moderation rules across supported model types. Both run inline at the gateway, so protections apply uniformly regardless of which provider is called. This is meaningful for regulated workloads where prompt or response content must be sanitized before crossing service boundaries.
$0
Bundled with Cloudflare plan
Ready to get started with Cloudflare AI Gateway?
View Pricing Options →Cloudflare AI Gateway works with these platforms and services:
We believe in transparent reviews. Here's what Cloudflare AI Gateway doesn't handle well:
Weekly insights on the latest AI tools, features, and trends delivered to your inbox.
Documentation last updated April 20, 2026, reflecting an expanded provider lineup (20+ providers including Cerebras, Baseten, Cartesia, Parallel, xAI, Ideogram, and Fal AI), the Unified OpenAI-compatible API, beta releases of Dynamic Routing with JSON configuration, Data Loss Prevention (DLP), Guardrails, BYOK key storage, Custom Providers, and a WebSockets API supporting both realtime and non-realtime streaming.
Analytics & Monitoring
Open-source LLM observability platform and API gateway that provides cost analytics, request logging, caching, and rate limiting through a simple proxy-based integration requiring only a base URL change.
Analytics & Monitoring
LangSmith lets you trace, analyze, and evaluate LLM applications and agents with deep observability into every model call, chain step, and tool invocation.
Analytics & Monitoring
Leading open-source LLM observability platform for production AI applications. Comprehensive tracing, prompt management, evaluation frameworks, and cost optimization with enterprise security (SOC2, ISO27001, HIPAA). Self-hostable with full feature parity.
No reviews yet. Be the first to share your experience!
Get started with Cloudflare AI Gateway and see if it's the right fit for your needs.
Get Started →Take our 60-second quiz to get personalized tool recommendations
Find Your Perfect AI Stack →Explore 20 ready-to-deploy AI agent templates for sales, support, dev, research, and operations.
Browse Agent Templates →An autonomous agent at a Fortune 500 company dropped a production database table at 3am on a Saturday. The guardrail that was supposed to prevent it? A hardcoded if-statement. Here's how to actually govern AI agents in production — with the frameworks, tools, and patterns that work.
Compare Firecrawl and Cloudflare's new Browser Rendering crawl endpoint for AI agent web scraping. Features, pricing, performance analysis for RAG pipelines and data extraction.