Stay free if you only need basic features. Upgrade if you need advanced features. Most solo builders can start free.
A typical voice stack runs three sequential models: speech-to-text, an LLM, then text-to-speech. Each hop adds latency and the STT step throws away tone, pacing, and emotion. Ultravox uses a single speech-native model that takes audio in and produces a conversational response directly, which both reduces end-to-end latency to sub-second levels and preserves paralinguistic signals the model can reason about.
Yes. Ultravox is designed to plug into telephony providers such as Twilio so you can build inbound and outbound phone agents, and it also supports WebRTC for browser- and app-based voice. You bring the telephony account; Ultravox handles the real-time voice intelligence.
Yes. Voice agents built on Ultravox can call developer-defined tools and functions during a live conversation, which means they can look up records, hit internal APIs, transfer calls, send messages, or trigger workflows — not just chat.
The Ultravox model has been published on Hugging Face and can be self-hosted, which is unusual in the real-time voice AI space. Most teams still use the managed API for production because it handles scaling, infrastructure, and telephony integration, but the open weights are available for teams that need full control.
Fixie.ai is the company's previous name and broader agent-platform identity. The team focused down on real-time voice and rebranded to Ultravox, which is now both the product and the underlying speech-native model. Existing Fixie API users were migrated onto the Ultravox platform.
Start with the free plan — upgrade when you need more.
Get Started Free →Still not sure? Read our full verdict →
Last verified March 2026