OpenAI's API for real-time voice conversations and audio processing, enabling low-latency speech-to-speech interactions.
OpenAI's API for real-time voice conversations and audio processing, enabling low-latency speech-to-speech interactions.
The OpenAI Realtime API is a paid, usage-based Voice/Audio AI developer service that delivers sub-second speech-to-speech interactions starting at $40 per 1M audio input tokens and $80 per 1M audio output tokens (gpt-4o-mini-realtime), enabling developers to build low-latency voice agents without stitching together separate STT, LLM, and TTS pipelines.
Rather than cascading audio through discrete speech-to-text, language model, and text-to-speech stages — an approach that typically adds 2–5 seconds of round-trip latency — the Realtime API accepts audio (and text) input and returns audio (and text) output through a single streaming connection. This unified architecture delivers approximately 300 ms first-audio latency over WebRTC, preserves prosodic and emotional nuances of speech, and enables natural turn-taking behaviors such as interruption handling and back-channeling that feel much closer to human conversation than traditional cascaded voice stacks.
Under the hood, the Realtime API exposes a persistent, bidirectional session — established via WebSocket or WebRTC — over which developers exchange structured events. These events cover session configuration (voice selection, instructions, modalities, turn detection settings), conversation state (adding user messages, managing conversation items), response generation (triggering model responses, streaming audio deltas), and tool/function calling. The event-driven model lets applications react incrementally as audio tokens stream back, so users start hearing responses within hundreds of milliseconds rather than waiting for a full generation to complete.
The API supports server-side voice activity detection (VAD) with configurable silence thresholds (default 500 ms), which automatically detects when a user starts and stops speaking, enabling hands-free, always-listening experiences. It also supports function calling in the same way the standard Chat Completions and Responses APIs do, which means voice agents can look up data, trigger workflows, or interact with external systems mid-conversation. Developers can pick from a set of built-in voices, tune the model's persona via system instructions, and switch seamlessly between text and audio modalities within a single session.
Typical use cases include customer support voice agents, voice-enabled copilots inside web and mobile apps, language tutoring and pronunciation coaching, accessibility tools, in-car and smart-device assistants, and interactive gaming NPCs. Because OpenAI offers both WebRTC (ideal for browsers and mobile clients) and WebSocket (ideal for server-to-server scenarios) transports, teams can build end-user experiences that connect devices directly to OpenAI while keeping their own backend in the loop for authentication, business logic, and tool execution. The Realtime API is positioned as the foundation for a new generation of voice-first AI products, combining the reasoning quality of GPT-class models with the immediacy required for real conversation.
Was this helpful?
From $40/1M audio input tokens (gpt-4o-mini-realtime)
Custom volume discounts
Ready to get started with OpenAI Realtime API?
View Pricing Options →We believe in transparent reviews. Here's what OpenAI Realtime API doesn't handle well:
Weekly insights on the latest AI tools, features, and trends delivered to your inbox.
Through early 2026, OpenAI has continued to iterate on the Realtime API with a focus on production readiness: improved voice quality and more natural-sounding built-in voices, broader language coverage, more robust interruption handling, and tighter integration with the Responses API and Agents SDK so voice agents can share the same tool definitions and orchestration logic as text-based agents. Transport improvements on WebRTC have reduced first-audio latency, and pricing has trended downward as newer, more efficient Realtime-capable models are released alongside OpenAI's latest GPT generations.
No reviews yet. Be the first to share your experience!
Get started with OpenAI Realtime API and see if it's the right fit for your needs.
Get Started →Take our 60-second quiz to get personalized tool recommendations
Find Your Perfect AI Stack →Explore 20 ready-to-deploy AI agent templates for sales, support, dev, research, and operations.
Browse Agent Templates →