Speech-to-text, text-to-speech and voice agent APIs with industry-leading latency, accuracy and per-language model quality.
Speech-to-text, text-to-speech and voice agent APIs with industry-leading latency, accuracy and per-language model quality.
Deepgram is the long-running speech AI platform that has quietly become the default STT engine behind a large share of production voice agents, contact-centre analytics tools and meeting bots. The Nova-3 STT model delivers state-of-the-art word error rate across 30+ languages with sub-300ms streaming latency, includes diarisation, smart formatting and keyword boosting, and runs cheaper-per-minute than competing managed providers. Deepgram also ships Aura, a streaming TTS model designed for low-latency voice agents, and the Deepgram Voice Agent API, a single endpoint that combines STT, an LLM of your choice and Aura TTS with turn-taking handled server-side — the cleanest way to ship a phone-able agent if you want one vendor end-to-end. Beyond real-time, Deepgram has strong batch transcription for podcast and video workflows with topic detection, entity extraction, summarisation and translation. New customers start with a \$200 credit, then pay metered per-minute rates that scale down with volume, and enterprise customers can run Deepgram fully on-prem for HIPAA and air-gapped use cases. Deepgram remains the default choice when accuracy per dollar matters more than brand cachet.
Was this helpful?
Deepgram offers the best price-to-performance ratio in speech-to-text with Nova-2's industry-leading accuracy and sub-300ms real-time latency. The combined STT/TTS offering simplifies voice application development, though TTS voice variety is more limited than specialized services.
$200 credit
From $0.0043/min
Contact sales
Ready to get started with Deepgram?
View Pricing Options →Deepgram works with these platforms and services:
We believe in transparent reviews. Here's what Deepgram doesn't handle well:
Weekly insights on the latest AI tools, features, and trends delivered to your inbox.
Deepgram launched Flux, a multilingual conversational speech-to-text model supporting 10 languages (English, Spanish, German, French, Hindi, Russian, Portuguese, Japanese, Italian, Dutch) with automatic language detection and intelligent endpointing optimized for voice agents. The unified Voice Agent API has been promoted as Deepgram's flagship offering, combining STT, LLM orchestration, and TTS in a single endpoint, alongside a deeper Amazon Connect integration for contact center deployments.
No reviews yet. Be the first to share your experience!
Get started with Deepgram and see if it's the right fit for your needs.
Get Started →Take our 60-second quiz to get personalized tool recommendations
Find Your Perfect AI Stack →Explore 20 ready-to-deploy AI agent templates for sales, support, dev, research, and operations.
Browse Agent Templates →