Deepgram: Free vs Paid — Is the Free Plan Enough?

⚡ Quick Verdict

Stay free if you only need $200 in free api credits on signup and access to nova stt, aura tts, and voice agent api. Upgrade if you need discounted volume pricing on stt and tts and higher concurrency and rate limits. Most solo builders can start free.

Try Free Plan →Compare Plans ↓

Who Should Stay Free vs Who Should Upgrade

👤

Stay Free If You're...

✓Individual user
✓Basic needs only
✓Personal projects
✓Getting started
✓Budget-conscious

👤

Upgrade If You're...

✓Business professional
✓Advanced features needed
✓Team collaboration
✓Higher usage limits
✓Premium support

What Users Say About Deepgram

👍 What Users Love

✓Nova transcription model delivers industry-leading word error rates, often 15-30% lower than Google or AWS on conversational and accented audio
✓Sub-300ms streaming latency over WebSockets makes it viable for real-time conversational voice agents
✓Flux (launched 2026) provides multilingual conversational STT in 10 languages with automatic language detection and intelligent endpointing
✓Pay-as-you-go pricing starting at $0.0043/min is typically 50-75% cheaper than Google Cloud Speech, AWS Transcribe, or Azure Speech
✓Unified Voice Agent API combines STT + LLM orchestration + TTS in a single endpoint, reducing integration complexity and round-trip latency
✓Self-hosted deployment available — rare in this category — for healthcare, finance, and government compliance requirements

👎 Common Concerns

⚠Aura TTS offers a smaller voice catalog and less expressive range than specialized providers like ElevenLabs or PlayHT
⚠Custom model fine-tuning is gated behind enterprise contracts with significant minimum commitments
⚠Cloud API requires internet connectivity by default; offline use requires the more expensive self-hosted tier
⚠Documentation depth on advanced features (custom vocabulary tuning, on-prem ops) lags behind hyperscaler competitors
⚠Audio files longer than ~4 hours typically need to be chunked client-side for optimal batch performance

🔒 What Free Doesn't Include

🎯 Nova pre-recorded STT from $0.0043/min

Why it matters: Aura TTS offers a smaller voice catalog and less expressive range than specialized providers like ElevenLabs or PlayHT

Available from: Pay As You Go

🎯 Nova streaming STT from $0.0077/min

Why it matters: Custom model fine-tuning is gated behind enterprise contracts with significant minimum commitments

Available from: Pay As You Go

🎯 Aura TTS billed per character

Why it matters: Cloud API requires internet connectivity by default; offline use requires the more expensive self-hosted tier

Available from: Pay As You Go

🎯 Voice Agent API usage-based billing

Why it matters: Documentation depth on advanced features (custom vocabulary tuning, on-prem ops) lags behind hyperscaler competitors

Available from: Pay As You Go

🎯 All 30+ languages and audio intelligence features

Why it matters: Audio files longer than ~4 hours typically need to be chunked client-side for optimal batch performance

Available from: Pay As You Go

Frequently Asked Questions

How accurate is Deepgram compared to Google, AWS, and AssemblyAI?

Deepgram's Nova model consistently posts the lowest word error rates in independent benchmarks, particularly on conversational audio with accents, crosstalk, or background noise. Real-world deployments report 15-30% relative WER reductions compared to Google Speech-to-Text and AWS Transcribe. Against AssemblyAI, Deepgram tends to win on streaming latency and pricing, while AssemblyAI is competitive on long-form batch accuracy. For multilingual conversational use, the new Flux model raises the bar further with built-in language detection across 10 languages.

What does Deepgram cost and is there a free tier?

Deepgram offers $200 in free credits on signup with no credit card required, which translates to roughly 750 hours of Nova streaming transcription. Pay-as-you-go STT pricing starts around $0.0043 per minute for pre-recorded Nova and $0.0077 per minute for streaming, with TTS billed per character. Growth and Enterprise tiers offer volume discounts, committed-use contracts, and custom model training. This pricing is typically 50-75% below Google Cloud Speech and AWS Transcribe at comparable quality levels.

What's the latency for real-time voice agents built on Deepgram?

End-to-end speech-to-text latency is typically 100-300ms over the WebSocket streaming API, with interim results returned even faster. The unified Voice Agent API further compresses round-trip time by collocating STT, LLM orchestration, and TTS — eliminating the network hops you'd see when stitching three separate vendors together. The new Flux model adds intelligent endpointing so the system reliably knows when a user has stopped speaking, which is critical for natural turn-taking in phone-quality conversations.

Can Deepgram be self-hosted for HIPAA or on-prem requirements?

Yes — self-hosted deployment is one of Deepgram's key differentiators in the speech API category. Enterprise customers can run the same Nova and TTS models inside their own VPC, on-premises data centers, or air-gapped environments. This makes it viable for HIPAA-regulated medical transcription, financial services with data-residency rules, and government workloads. Most major cloud-only competitors do not offer a comparable self-hosted option.

Which languages and audio intelligence features does Deepgram support?

Deepgram supports 30+ languages for transcription, with the new 2026 Flux model offering conversational STT in 10 languages including English, Spanish, German, French, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch with automatic language detection. Beyond raw transcription, the Audio Intelligence API adds summarization, sentiment analysis, topic detection, intent recognition, speaker diarization, and smart formatting. These can be applied to both batch files and live streams via flags on the same API call.

Ready to Try Deepgram?

Start with the free plan — upgrade when you need more.

Get Started Free →