OpenAI Realtime API Pricing & Plans 2026

Name: OpenAI Realtime API
Brand: OpenAI Realtime API
Price: 40 USD
Availability: InStock

Complete pricing guide for OpenAI Realtime API. Compare all plans, analyze costs, and find the perfect tier for your needs.

Try OpenAI Realtime API Free →Compare Plans ↓

Not sure if free is enough? See our Free vs Paid comparison →
Still deciding? Read our full verdict on whether OpenAI Realtime API is worth it →

💎2 Paid Plans

⚡No Setup Fees

Choose Your Plan

Pay-as-you-go API usage

From $40/1M audio input tokens (gpt-4o-mini-realtime)

✓gpt-4o-realtime: $100 per 1M audio input tokens, $200 per 1M audio output tokens; text at $5/$20 per 1M tokens
✓gpt-4o-mini-realtime: $40 per 1M audio input tokens, $80 per 1M audio output tokens; text at $2.50/$10 per 1M tokens
✓Access to all Realtime-capable GPT models
✓WebRTC and WebSocket transports included
✓Built-in tool/function calling and VAD at no additional charge

Start Free Trial →

Enterprise / Scale

Custom volume discounts

✓Negotiated per-token rates below published list prices for high-volume commitments
✓Enterprise agreements with data handling and compliance commitments
✓Higher rate limits and committed throughput capacity
✓Support SLAs and dedicated account management

Start Free Trial →

Pricing sourced from OpenAI Realtime API · Last verified March 2026

Feature Comparison

Features	Pay-as-you-go API usage	Enterprise / Scale
gpt-4o-realtime: $100 per 1M audio input tokens, $200 per 1M audio output tokens; text at $5/$20 per 1M tokens	✓	✓
gpt-4o-mini-realtime: $40 per 1M audio input tokens, $80 per 1M audio output tokens; text at $2.50/$10 per 1M tokens	✓	✓
Access to all Realtime-capable GPT models	✓	✓
WebRTC and WebSocket transports included	✓	✓
Built-in tool/function calling and VAD at no additional charge	✓	✓
Negotiated per-token rates below published list prices for high-volume commitments	—	✓
Enterprise agreements with data handling and compliance commitments	—	✓
Higher rate limits and committed throughput capacity	—	✓
Support SLAs and dedicated account management	—	✓

Is OpenAI Realtime API Worth It?

✅ Why Choose OpenAI Realtime API

• Single speech-to-speech pipeline eliminates the latency and quality loss of chaining separate STT, LLM, and TTS services
• Supports both WebRTC and WebSocket transports, making it suitable for browser, mobile, and server-side integrations
• Built-in server-side voice activity detection and interruption handling produce natural turn-taking without custom audio engineering
• Native function/tool calling within voice sessions lets agents invoke APIs, look up data, and complete tasks mid-conversation
• Preserves prosody, tone, and emotional nuance that are typically lost when transcribing speech to text first
• Backed by OpenAI's infrastructure and model quality, giving production-grade reasoning, multilingual coverage, and reliability

⚠️ Consider This

• Audio token pricing is significantly higher than text-only API usage, which can make long or high-volume voice sessions expensive
• Realtime streaming and persistent connections add architectural complexity compared to stateless REST endpoints
• Limited set of built-in voices and no support for fully custom voice cloning restricts brand personalization
• Tight coupling to OpenAI means vendor lock-in and no on-premise or offline deployment option for sensitive workloads
• Event-driven API surface has a steeper learning curve and fewer mature SDK abstractions than standard chat completions

What Users Say About OpenAI Realtime API

👍 What Users Love

✓Single speech-to-speech pipeline eliminates the latency and quality loss of chaining separate STT, LLM, and TTS services
✓Supports both WebRTC and WebSocket transports, making it suitable for browser, mobile, and server-side integrations
✓Built-in server-side voice activity detection and interruption handling produce natural turn-taking without custom audio engineering
✓Native function/tool calling within voice sessions lets agents invoke APIs, look up data, and complete tasks mid-conversation
✓Preserves prosody, tone, and emotional nuance that are typically lost when transcribing speech to text first
✓Backed by OpenAI's infrastructure and model quality, giving production-grade reasoning, multilingual coverage, and reliability

👎 Common Concerns

⚠Audio token pricing is significantly higher than text-only API usage, which can make long or high-volume voice sessions expensive
⚠Realtime streaming and persistent connections add architectural complexity compared to stateless REST endpoints
⚠Limited set of built-in voices and no support for fully custom voice cloning restricts brand personalization
⚠Tight coupling to OpenAI means vendor lock-in and no on-premise or offline deployment option for sensitive workloads
⚠Event-driven API surface has a steeper learning curve and fewer mature SDK abstractions than standard chat completions

Pricing FAQ

What transports does the OpenAI Realtime API support?

The Realtime API supports WebRTC, which is recommended for browser and mobile clients that need the lowest possible latency, and WebSockets, which are better suited for server-to-server integrations where a backend service mediates between users and the API.

Does the Realtime API handle interruptions and turn-taking automatically?

Yes. The API includes server-side voice activity detection (VAD) that detects when a user starts and stops speaking, automatically segments turns, and allows users to interrupt the model mid-response, which the model gracefully handles by truncating its current output.

Can I use function calling and tools in a voice session?

Yes. The Realtime API supports the same tool and function-calling paradigm as OpenAI's other APIs. You can register tools during session configuration, and the model can decide to call them mid-conversation so the voice agent can fetch data or trigger external actions.

Is the Realtime API limited to audio, or can it handle text as well?

The API is multimodal: a single session can accept and produce text, audio, or both. Developers can configure which modalities are enabled and can mix text inputs (for example, system instructions or silent context updates) with streaming audio within the same conversation.

How is pricing calculated for the Realtime API?

Usage is billed per token with separate rates for audio and text. For the gpt-4o-realtime model, audio input costs $100 per 1M tokens and audio output costs $200 per 1M tokens, while text input is $5 and text output is $20 per 1M tokens. The more affordable gpt-4o-mini-realtime model charges $40 per 1M audio input tokens and $80 per 1M audio output tokens, with text at $2.50 input and $10 output per 1M tokens. Because speech generates more tokens per second than equivalent text, audio-heavy sessions are priced higher, and developers should monitor session duration and output length to control costs.

Ready to Get Started?

AI builders and operators use OpenAI Realtime API to streamline their workflow.

Try OpenAI Realtime API Now →

More about OpenAI Realtime API

Review Alternatives Free vs Paid Pros & Cons Worth It?Tutorial

Choose Your Plan

Pay-as-you-go API usage

From $40/1M audio input tokens (gpt-4o-mini-realtime)

✓gpt-4o-realtime: $100 per 1M audio input tokens, $200 per 1M audio output tokens; text at $5/$20 per 1M tokens
✓gpt-4o-mini-realtime: $40 per 1M audio input tokens, $80 per 1M audio output tokens; text at $2.50/$10 per 1M tokens
✓Access to all Realtime-capable GPT models
✓WebRTC and WebSocket transports included
✓Built-in tool/function calling and VAD at no additional charge

Start Free Trial →

Enterprise / Scale

Custom volume discounts

✓Negotiated per-token rates below published list prices for high-volume commitments
✓Enterprise agreements with data handling and compliance commitments
✓Higher rate limits and committed throughput capacity
✓Support SLAs and dedicated account management

Start Free Trial →

Pricing sourced from OpenAI Realtime API · Last verified March 2026

Feature Comparison

Features	Pay-as-you-go API usage	Enterprise / Scale
gpt-4o-realtime: $100 per 1M audio input tokens, $200 per 1M audio output tokens; text at $5/$20 per 1M tokens	✓	✓
gpt-4o-mini-realtime: $40 per 1M audio input tokens, $80 per 1M audio output tokens; text at $2.50/$10 per 1M tokens	✓	✓
Access to all Realtime-capable GPT models	✓	✓
WebRTC and WebSocket transports included	✓	✓
Built-in tool/function calling and VAD at no additional charge	✓	✓
Negotiated per-token rates below published list prices for high-volume commitments	—	✓
Enterprise agreements with data handling and compliance commitments	—	✓
Higher rate limits and committed throughput capacity	—	✓
Support SLAs and dedicated account management	—	✓

Is OpenAI Realtime API Worth It?

✅ Why Choose OpenAI Realtime API

• Single speech-to-speech pipeline eliminates the latency and quality loss of chaining separate STT, LLM, and TTS services
• Supports both WebRTC and WebSocket transports, making it suitable for browser, mobile, and server-side integrations
• Built-in server-side voice activity detection and interruption handling produce natural turn-taking without custom audio engineering
• Native function/tool calling within voice sessions lets agents invoke APIs, look up data, and complete tasks mid-conversation
• Preserves prosody, tone, and emotional nuance that are typically lost when transcribing speech to text first
• Backed by OpenAI's infrastructure and model quality, giving production-grade reasoning, multilingual coverage, and reliability

⚠️ Consider This

• Audio token pricing is significantly higher than text-only API usage, which can make long or high-volume voice sessions expensive
• Realtime streaming and persistent connections add architectural complexity compared to stateless REST endpoints
• Limited set of built-in voices and no support for fully custom voice cloning restricts brand personalization
• Tight coupling to OpenAI means vendor lock-in and no on-premise or offline deployment option for sensitive workloads
• Event-driven API surface has a steeper learning curve and fewer mature SDK abstractions than standard chat completions

What Users Say About OpenAI Realtime API

👍 What Users Love

✓Single speech-to-speech pipeline eliminates the latency and quality loss of chaining separate STT, LLM, and TTS services
✓Supports both WebRTC and WebSocket transports, making it suitable for browser, mobile, and server-side integrations
✓Built-in server-side voice activity detection and interruption handling produce natural turn-taking without custom audio engineering
✓Native function/tool calling within voice sessions lets agents invoke APIs, look up data, and complete tasks mid-conversation
✓Preserves prosody, tone, and emotional nuance that are typically lost when transcribing speech to text first
✓Backed by OpenAI's infrastructure and model quality, giving production-grade reasoning, multilingual coverage, and reliability

👎 Common Concerns

⚠Audio token pricing is significantly higher than text-only API usage, which can make long or high-volume voice sessions expensive
⚠Realtime streaming and persistent connections add architectural complexity compared to stateless REST endpoints
⚠Limited set of built-in voices and no support for fully custom voice cloning restricts brand personalization
⚠Tight coupling to OpenAI means vendor lock-in and no on-premise or offline deployment option for sensitive workloads
⚠Event-driven API surface has a steeper learning curve and fewer mature SDK abstractions than standard chat completions