Voice AI🔴Developer

Deepgram

Name: Deepgram
Brand: Deepgram
Price: 200 USD
Availability: InStock

Speech-to-text, text-to-speech and voice agent APIs with industry-leading latency, accuracy and per-language model quality.

Starting atFree

Visit Deepgram →

💡

In Plain English

Speech-to-text, text-to-speech and voice agent APIs with industry-leading latency, accuracy and per-language model quality.

Overview

Deepgram is the long-running speech AI platform that has quietly become the default STT engine behind a large share of production voice agents, contact-centre analytics tools and meeting bots. The Nova-3 STT model delivers state-of-the-art word error rate across 30+ languages with sub-300ms streaming latency, includes diarisation, smart formatting and keyword boosting, and runs cheaper-per-minute than competing managed providers. Deepgram also ships Aura, a streaming TTS model designed for low-latency voice agents, and the Deepgram Voice Agent API, a single endpoint that combines STT, an LLM of your choice and Aura TTS with turn-taking handled server-side — the cleanest way to ship a phone-able agent if you want one vendor end-to-end. Beyond real-time, Deepgram has strong batch transcription for podcast and video workflows with topic detection, entity extraction, summarisation and translation. New customers start with a \$200 credit, then pay metered per-minute rates that scale down with volume, and enterprise customers can run Deepgram fully on-prem for HIPAA and air-gapped use cases. Deepgram remains the default choice when accuracy per dollar matters more than brand cachet.

🦞

Using with OpenClaw

▼

Integrate Deepgram with OpenClaw through the REST API or WebSocket connections for speech processing workflows and voice automation tasks.

Use Case Example:

Add voice capabilities to OpenClaw automation including transcription, voice commands, and speech synthesis.

Learn about OpenClaw →

🎨

Vibe Coding Friendly?

▼

Difficulty:beginner

No-Code Friendly ✨

Well-documented REST API with SDKs for all major programming languages, suitable for no-code integration platforms.

Learn about Vibe Coding →

Was this helpful?

Editorial Review

Deepgram offers the best price-to-performance ratio in speech-to-text with Nova-2's industry-leading accuracy and sub-300ms real-time latency. The combined STT/TTS offering simplifies voice application development, though TTS voice variety is more limited than specialized services.

Key Features

Speech-to-text APIs for streaming and prerecorded audio+

Flux conversational STT for real-time voice agents with turn detection and interruption handling+

Text-to-speech through Aura voices+

Voice Agent API for full conversational voice workflows+

Audio Intelligence add-ons including summarization, topic detection, sentiment and intent recognition+

Pricing Plans

Free Tier

$200 credit

Pay-as-you-go

From $0.0043/min

Growth/Enterprise

Contact sales

See Full Pricing →Free vs Paid →Is it worth it? →

Ready to get started with Deepgram?

View Pricing Options →

Getting Started with Deepgram

1Sign up at deepgram.com and verify your email to receive $200 in free credits
2Generate an API key from the Deepgram Console dashboard
3Install the Deepgram SDK for your programming language (Python, JavaScript, etc.)
4Test speech-to-text with a sample audio file using the provided quickstart examples
5Integrate real-time streaming transcription using WebSocket connections for live audio

Ready to start? Try Deepgram →

Best Use Cases

🎯

Real-time STT inside voice agents

⚡

Contact-center call analytics at scale

🔧

Meeting and podcast transcription

🚀

Compliance-sensitive deployments needing on-prem STT

Integration Ecosystem

10 integrations

Deepgram works with these platforms and services:

🧠 LLM Providers

OpenAIAnthropic

☁️ Cloud Platforms

AWSGCPAzure

💬 Communication

Twiliovapiretell

🔗 Other

Zapierwebhooks

View full Integration Matrix →

Limitations & What It Can't Do

We believe in transparent reviews. Here's what Deepgram doesn't handle well:

⚠Costs can stack when STT, TTS, voice agent time and intelligence add-ons are combined
⚠Custom models, enterprise deployment and higher support needs require sales conversations
⚠Builders still need orchestration, telephony and app logic around the APIs

Pros & Cons

✓ Pros

✓Best-in-class word error rate via Nova-3 model across 30+ languages
✓Aggressively priced per-minute: from $0.0043/min beats most rivals
✓Voice Agent API unifies STT + LLM + TTS with server-side turn-taking
✓Free $200 credit lets teams prototype end-to-end without commitment
✓On-prem deployment supports HIPAA and air-gapped environments

✗ Cons

✗Aura TTS voice library smaller than ElevenLabs or Cartesia
✗Documentation can feel dense for first-time integrators
✗Some advanced features (diarisation tuning) require sales conversations
✗Voice agent API still maturing relative to Vapi or Retell AI for high-level orchestration

Frequently Asked Questions

How accurate is Deepgram compared to Google, AWS, and AssemblyAI?+

Deepgram's Nova model consistently posts the lowest word error rates in independent benchmarks, particularly on conversational audio with accents, crosstalk, or background noise. Real-world deployments report 15-30% relative WER reductions compared to Google Speech-to-Text and AWS Transcribe. Against AssemblyAI, Deepgram tends to win on streaming latency and pricing, while AssemblyAI is competitive on long-form batch accuracy. For multilingual conversational use, the new Flux model raises the bar further with built-in language detection across 10 languages.

What does Deepgram cost and is there a free tier?+

Deepgram offers $200 in free credits on signup with no credit card required, which translates to roughly 750 hours of Nova streaming transcription. Pay-as-you-go STT pricing starts around $0.0043 per minute for pre-recorded Nova and $0.0077 per minute for streaming, with TTS billed per character. Growth and Enterprise tiers offer volume discounts, committed-use contracts, and custom model training. This pricing is typically 50-75% below Google Cloud Speech and AWS Transcribe at comparable quality levels.

What's the latency for real-time voice agents built on Deepgram?+

End-to-end speech-to-text latency is typically 100-300ms over the WebSocket streaming API, with interim results returned even faster. The unified Voice Agent API further compresses round-trip time by collocating STT, LLM orchestration, and TTS — eliminating the network hops you'd see when stitching three separate vendors together. The new Flux model adds intelligent endpointing so the system reliably knows when a user has stopped speaking, which is critical for natural turn-taking in phone-quality conversations.

Can Deepgram be self-hosted for HIPAA or on-prem requirements?+

Yes — self-hosted deployment is one of Deepgram's key differentiators in the speech API category. Enterprise customers can run the same Nova and TTS models inside their own VPC, on-premises data centers, or air-gapped environments. This makes it viable for HIPAA-regulated medical transcription, financial services with data-residency rules, and government workloads. Most major cloud-only competitors do not offer a comparable self-hosted option.

Which languages and audio intelligence features does Deepgram support?+

Deepgram supports 30+ languages for transcription, with the new 2026 Flux model offering conversational STT in 10 languages including English, Spanish, German, French, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch with automatic language detection. Beyond raw transcription, the Audio Intelligence API adds summarization, sentiment analysis, topic detection, intent recognition, speaker diarization, and smart formatting. These can be applied to both batch files and live streams via flags on the same API call.

🔒 Security & Compliance

🛡️ SOC2 Compliant

✅

SOC2

Yes

✅

GDPR

Yes

❌

HIPAA

✅

SSO

Yes

—

Self-Hosted

Unknown

✅

On-Prem

Yes

✅

RBAC

Yes

✅

Audit Log

Yes

✅

API Key Auth

Yes

❌

Open Source

✅

Encryption at Rest

Yes

✅

Encryption in Transit

Yes

Data Retention: configurable

Data Residency: US

📋 Privacy Policy →🛡️ Security Page →

🦞

New to AI tools?

Read practical guides for choosing and using AI tools

Read Guides →

Get updates on Deepgram and 370+ other AI tools

Weekly insights on the latest AI tools, features, and trends delivered to your inbox.

What's New in 2026

Deepgram launched Flux, a multilingual conversational speech-to-text model supporting 10 languages (English, Spanish, German, French, Hindi, Russian, Portuguese, Japanese, Italian, Dutch) with automatic language detection and intelligent endpointing optimized for voice agents. The unified Voice Agent API has been promoted as Deepgram's flagship offering, combining STT, LLM orchestration, and TTS in a single endpoint, alongside a deeper Amazon Connect integration for contact center deployments.

Alternatives to Deepgram

AssemblyAI

Speech AI APIs

Developer speech AI API platform for transcription, real-time speech-to-text, speech understanding, guardrails, and voice agents.

View All Alternatives & Detailed Comparison →

User Reviews

No reviews yet. Be the first to share your experience!

Quick Info

Try Deepgram Today

Get started with Deepgram and see if it's the right fit for your needs.

Get Started →

Need help choosing the right AI stack?

Take our 60-second quiz to get personalized tool recommendations

Find Your Perfect AI Stack →

Want a faster launch?

Explore 20 ready-to-deploy AI agent templates for sales, support, dev, research, and operations.

Browse Agent Templates →

More about Deepgram

Pricing Review Alternatives Free vs Paid Pros & Cons Worth It?Tutorial

Overview

Key Features

Speech-to-text APIs for streaming and prerecorded audio+

Flux conversational STT for real-time voice agents with turn detection and interruption handling+

Text-to-speech through Aura voices+

Voice Agent API for full conversational voice workflows+

Audio Intelligence add-ons including summarization, topic detection, sentiment and intent recognition+

Getting Started with Deepgram

1Sign up at deepgram.com and verify your email to receive $200 in free credits

2Generate an API key from the Deepgram Console dashboard

3Install the Deepgram SDK for your programming language (Python, JavaScript, etc.)

4Test speech-to-text with a sample audio file using the provided quickstart examples

5Integrate real-time streaming transcription using WebSocket connections for live audio

Limitations & What It Can't Do

We believe in transparent reviews. Here's what Deepgram doesn't handle well:

⚠Costs can stack when STT, TTS, voice agent time and intelligence add-ons are combined

⚠Custom models, enterprise deployment and higher support needs require sales conversations

⚠Builders still need orchestration, telephony and app logic around the APIs

Pros & Cons

✓ Pros

✓Best-in-class word error rate via Nova-3 model across 30+ languages
✓Aggressively priced per-minute: from $0.0043/min beats most rivals
✓Voice Agent API unifies STT + LLM + TTS with server-side turn-taking
✓Free $200 credit lets teams prototype end-to-end without commitment
✓On-prem deployment supports HIPAA and air-gapped environments

✗ Cons

✗Aura TTS voice library smaller than ElevenLabs or Cartesia
✗Documentation can feel dense for first-time integrators
✗Some advanced features (diarisation tuning) require sales conversations
✗Voice agent API still maturing relative to Vapi or Retell AI for high-level orchestration