Voxtral Transcribe 2 Pricing & Plans 2026

Name: Voxtral Transcribe 2
Brand: Voxtral Transcribe 2
Availability: InStock

Complete pricing guide for Voxtral Transcribe 2. Compare all plans, analyze costs, and find the perfect tier for your needs.

Try Voxtral Transcribe 2 Free →Compare Plans ↓

Not sure if free is enough? See our Free vs Paid comparison →
Still deciding? Read our full verdict on whether Voxtral Transcribe 2 is worth it →

🆓Free Tier Available

💎3 Paid Plans

⚡No Setup Fees

Choose Your Plan

Mistral Studio Audio Playground

Free

✓Test Voxtral Transcribe 2 directly in-browser
✓Upload up to 10 audio files
✓Toggle diarization and timestamp granularity
✓Add context bias terms
✓Supports .mp3, .wav, .m4a, .flac, .ogg up to 1GB each

Start Free →

Voxtral Mini Transcribe V2 (API)

$0.003/min

✓Batch transcription via API
✓Speaker diarization with timestamps
✓Context biasing (up to 100 terms)
✓Word-level timestamps
✓Support for recordings up to 3 hours
✓13 languages supported

Start Free Trial →

Voxtral Realtime (API)

$0.006/min

✓Real-time streaming transcription
✓Configurable latency down to sub-200ms
✓13 languages supported
✓Purpose-built for voice agents and live applications
✓Matches batch accuracy at 2.4s delay

Start Free Trial →

Voxtral Realtime (Open Weights)

Free (Apache 2.0)

✓Full model weights on Hugging Face Hub
✓4B parameter footprint, runs on edge devices
✓Apache 2.0 license — commercial use allowed
✓Self-hosted, privacy-first deployment
✓GDPR/HIPAA-compatible on-prem use

Start Free →

Enterprise / Private Cloud

Contact Sales

✓GDPR and HIPAA-compliant deployments
✓Secure on-premise or private cloud setup
✓Dedicated support
✓Custom SLAs
✓Volume pricing

Contact Sales →

Pricing sourced from Voxtral Transcribe 2 · Last verified March 2026

Feature Comparison

Features	Mistral Studio Audio Playground	Voxtral Mini Transcribe V2 (API)	Voxtral Realtime (API)	Voxtral Realtime (Open Weights)	Enterprise / Private Cloud
Test Voxtral Transcribe 2 directly in-browser	✓	✓	✓	✓	✓
Upload up to 10 audio files	✓	✓	✓	✓	✓
Toggle diarization and timestamp granularity	✓	✓	✓	✓	✓
Add context bias terms	✓	✓	✓	✓	✓
Supports .mp3, .wav, .m4a, .flac, .ogg up to 1GB each	✓	✓	✓	✓	✓
Batch transcription via API	—	✓	✓	✓	✓
Speaker diarization with timestamps	—	✓	✓	✓	✓
Context biasing (up to 100 terms)	—	✓	✓	✓	✓
Word-level timestamps	—	✓	✓	✓	✓
Support for recordings up to 3 hours	—	✓	✓	✓	✓
13 languages supported	—	✓	✓	✓	✓
Real-time streaming transcription	—	—	✓	✓	✓
Configurable latency down to sub-200ms	—	—	✓	✓	✓
Purpose-built for voice agents and live applications	—	—	✓	✓	✓
Matches batch accuracy at 2.4s delay	—	—	✓	✓	✓
Full model weights on Hugging Face Hub	—	—	—	✓	✓
4B parameter footprint, runs on edge devices	—	—	—	✓	✓
Apache 2.0 license — commercial use allowed	—	—	—	✓	✓
Self-hosted, privacy-first deployment	—	—	—	✓	✓
GDPR/HIPAA-compatible on-prem use	—	—	—	✓	✓
GDPR and HIPAA-compliant deployments	—	—	—	—	✓
Secure on-premise or private cloud setup	—	—	—	—	✓
Dedicated support	—	—	—	—	✓
Custom SLAs	—	—	—	—	✓
Volume pricing	—	—	—	—	✓

Is Voxtral Transcribe 2 Worth It?

✅ Why Choose Voxtral Transcribe 2

• Lowest published price point at $0.003/min for batch transcription, roughly one-fifth the cost of ElevenLabs Scribe v2
• Sub-200ms streaming latency makes it viable for real-time voice agents, with only 1-2% WER degradation versus offline mode
• Voxtral Realtime ships as open weights under Apache 2.0, enabling private on-device deployment for sensitive workloads
• Approximately 4% word error rate on FLEURS benchmark, beating GPT-4o mini Transcribe, Gemini 2.5 Flash, AssemblyAI Universal, and Deepgram Nova per Mistral's published comparisons
• Native multilingual support across 13 languages with strong non-English performance, not just English-first adaptation
• Long-form support up to 3 hours per request reduces chunking overhead for meetings and podcasts

⚠️ Consider This

• Context biasing is optimized for English; support for other languages is labeled experimental
• With overlapping speech, the model typically transcribes only one speaker rather than separating concurrent voices
• Only 13 languages supported, fewer than competitors like Whisper (99+) or Deepgram for niche language coverage
• Realtime model is open-weights but Mini Transcribe V2 is API-only, limiting self-hosted batch workflows
• Documentation and tooling are newer than incumbents like AssemblyAI or Deepgram, so ecosystem integrations are still maturing

What Users Say About Voxtral Transcribe 2

👍 What Users Love

✓Lowest published price point at $0.003/min for batch transcription, roughly one-fifth the cost of ElevenLabs Scribe v2
✓Sub-200ms streaming latency makes it viable for real-time voice agents, with only 1-2% WER degradation versus offline mode
✓Voxtral Realtime ships as open weights under Apache 2.0, enabling private on-device deployment for sensitive workloads
✓Approximately 4% word error rate on FLEURS benchmark, beating GPT-4o mini Transcribe, Gemini 2.5 Flash, AssemblyAI Universal, and Deepgram Nova per Mistral's published comparisons
✓Native multilingual support across 13 languages with strong non-English performance, not just English-first adaptation
✓Long-form support up to 3 hours per request reduces chunking overhead for meetings and podcasts

👎 Common Concerns

⚠Context biasing is optimized for English; support for other languages is labeled experimental
⚠With overlapping speech, the model typically transcribes only one speaker rather than separating concurrent voices
⚠Only 13 languages supported, fewer than competitors like Whisper (99+) or Deepgram for niche language coverage
⚠Realtime model is open-weights but Mini Transcribe V2 is API-only, limiting self-hosted batch workflows
⚠Documentation and tooling are newer than incumbents like AssemblyAI or Deepgram, so ecosystem integrations are still maturing

Pricing FAQ

How much does Voxtral Transcribe 2 cost?

Voxtral Mini Transcribe V2 costs $0.003 per minute via API for batch transcription, and Voxtral Realtime costs $0.006 per minute for streaming. Mistral positions this as the lowest price point in the category — roughly one-fifth the cost of ElevenLabs Scribe v2 at comparable quality. Voxtral Realtime is also available as free open weights under the Apache 2.0 license on Hugging Face, so self-hosters only pay infrastructure costs. There is also a free audio playground in Mistral Studio for testing.

What languages does Voxtral support?

Both Voxtral Mini Transcribe V2 and Voxtral Realtime natively support 13 languages: English, Chinese, Hindi, Spanish, Arabic, French, Portuguese, Russian, German, Japanese, Korean, Italian, and Dutch. According to Mistral's FLEURS benchmark results, non-English performance significantly outpaces competitors. Note that the context biasing feature is optimized primarily for English, with support for other languages still considered experimental.

How does Voxtral Realtime achieve sub-200ms latency?

Voxtral Realtime uses a novel streaming architecture that transcribes audio as it arrives, rather than adapting offline models by processing audio in chunks. Latency is configurable: at sub-200ms it powers responsive voice agents while staying within 1-2% word error rate of the batch model, and at 2.4 seconds delay it fully matches Voxtral Mini Transcribe V2's accuracy — ideal for live subtitling. The 4B-parameter footprint means it can also run on edge devices for privacy-sensitive deployments.

Is Voxtral suitable for HIPAA or GDPR-regulated workflows?

Yes. Mistral states that both models support GDPR and HIPAA-compliant deployments through secure on-premise or private cloud setups. The open-weights release of Voxtral Realtime under Apache 2.0 is particularly relevant here because it allows organizations to run transcription entirely within their own infrastructure, with no audio leaving their environment. This makes it well-suited for healthcare, legal, financial services, and other regulated industries.

How does Voxtral compare to Whisper, Deepgram, and AssemblyAI?

Per Mistral's published benchmarks, Voxtral Mini Transcribe V2 outperforms GPT-4o mini Transcribe, Gemini 2.5 Flash, AssemblyAI Universal, and Deepgram Nova on word error rate across FLEURS, while costing $0.003/min — significantly less than incumbents. It also processes audio approximately 3x faster than ElevenLabs Scribe v2 at one-fifth the cost. Compared to OpenAI Whisper (open source), Voxtral covers fewer languages (13 vs 99+) but offers higher accuracy in supported languages plus a hosted API with diarization and streaming built in.

Ready to Get Started?

AI builders and operators use Voxtral Transcribe 2 to streamline their workflow.

Try Voxtral Transcribe 2 Now →

More about Voxtral Transcribe 2

Review Alternatives Free vs Paid Pros & Cons Worth It?Tutorial

Compare Voxtral Transcribe 2 Pricing with Alternatives

Deepgram Pricing

Speech-to-text, text-to-speech and voice agent APIs with industry-leading latency, accuracy and per-language model quality.

Compare Pricing →

AssemblyAI Pricing

Developer speech AI API platform for transcription, real-time speech-to-text, speech understanding, guardrails, and voice agents.

Compare Pricing →

Choose Your Plan

Mistral Studio Audio Playground

Free

✓Test Voxtral Transcribe 2 directly in-browser
✓Upload up to 10 audio files
✓Toggle diarization and timestamp granularity
✓Add context bias terms
✓Supports .mp3, .wav, .m4a, .flac, .ogg up to 1GB each

Start Free →

Voxtral Mini Transcribe V2 (API)

$0.003/min

✓Batch transcription via API
✓Speaker diarization with timestamps
✓Context biasing (up to 100 terms)
✓Word-level timestamps
✓Support for recordings up to 3 hours
✓13 languages supported

Start Free Trial →

Voxtral Realtime (API)

$0.006/min

✓Real-time streaming transcription
✓Configurable latency down to sub-200ms
✓13 languages supported
✓Purpose-built for voice agents and live applications
✓Matches batch accuracy at 2.4s delay

Start Free Trial →

Voxtral Realtime (Open Weights)

Free (Apache 2.0)

✓Full model weights on Hugging Face Hub
✓4B parameter footprint, runs on edge devices
✓Apache 2.0 license — commercial use allowed
✓Self-hosted, privacy-first deployment
✓GDPR/HIPAA-compatible on-prem use

Start Free →

Enterprise / Private Cloud

Contact Sales

✓GDPR and HIPAA-compliant deployments
✓Secure on-premise or private cloud setup
✓Dedicated support
✓Custom SLAs
✓Volume pricing

Contact Sales →

Pricing sourced from Voxtral Transcribe 2 · Last verified March 2026

Feature Comparison

Features	Mistral Studio Audio Playground	Voxtral Mini Transcribe V2 (API)	Voxtral Realtime (API)	Voxtral Realtime (Open Weights)	Enterprise / Private Cloud
Test Voxtral Transcribe 2 directly in-browser	✓	✓	✓	✓	✓
Upload up to 10 audio files	✓	✓	✓	✓	✓
Toggle diarization and timestamp granularity	✓	✓	✓	✓	✓
Add context bias terms	✓	✓	✓	✓	✓
Supports .mp3, .wav, .m4a, .flac, .ogg up to 1GB each	✓	✓	✓	✓	✓
Batch transcription via API	—	✓	✓	✓	✓
Speaker diarization with timestamps	—	✓	✓	✓	✓
Context biasing (up to 100 terms)	—	✓	✓	✓	✓
Word-level timestamps	—	✓	✓	✓	✓
Support for recordings up to 3 hours	—	✓	✓	✓	✓
13 languages supported	—	✓	✓	✓	✓
Real-time streaming transcription	—	—	✓	✓	✓
Configurable latency down to sub-200ms	—	—	✓	✓	✓
Purpose-built for voice agents and live applications	—	—	✓	✓	✓
Matches batch accuracy at 2.4s delay	—	—	✓	✓	✓
Full model weights on Hugging Face Hub	—	—	—	✓	✓
4B parameter footprint, runs on edge devices	—	—	—	✓	✓
Apache 2.0 license — commercial use allowed	—	—	—	✓	✓
Self-hosted, privacy-first deployment	—	—	—	✓	✓
GDPR/HIPAA-compatible on-prem use	—	—	—	✓	✓
GDPR and HIPAA-compliant deployments	—	—	—	—	✓
Secure on-premise or private cloud setup	—	—	—	—	✓
Dedicated support	—	—	—	—	✓
Custom SLAs	—	—	—	—	✓
Volume pricing	—	—	—	—	✓

Is Voxtral Transcribe 2 Worth It?

✅ Why Choose Voxtral Transcribe 2

• Lowest published price point at $0.003/min for batch transcription, roughly one-fifth the cost of ElevenLabs Scribe v2
• Sub-200ms streaming latency makes it viable for real-time voice agents, with only 1-2% WER degradation versus offline mode
• Voxtral Realtime ships as open weights under Apache 2.0, enabling private on-device deployment for sensitive workloads
• Approximately 4% word error rate on FLEURS benchmark, beating GPT-4o mini Transcribe, Gemini 2.5 Flash, AssemblyAI Universal, and Deepgram Nova per Mistral's published comparisons
• Native multilingual support across 13 languages with strong non-English performance, not just English-first adaptation
• Long-form support up to 3 hours per request reduces chunking overhead for meetings and podcasts

⚠️ Consider This

• Context biasing is optimized for English; support for other languages is labeled experimental
• With overlapping speech, the model typically transcribes only one speaker rather than separating concurrent voices
• Only 13 languages supported, fewer than competitors like Whisper (99+) or Deepgram for niche language coverage
• Realtime model is open-weights but Mini Transcribe V2 is API-only, limiting self-hosted batch workflows
• Documentation and tooling are newer than incumbents like AssemblyAI or Deepgram, so ecosystem integrations are still maturing

What Users Say About Voxtral Transcribe 2

👍 What Users Love

✓Lowest published price point at $0.003/min for batch transcription, roughly one-fifth the cost of ElevenLabs Scribe v2
✓Sub-200ms streaming latency makes it viable for real-time voice agents, with only 1-2% WER degradation versus offline mode
✓Voxtral Realtime ships as open weights under Apache 2.0, enabling private on-device deployment for sensitive workloads
✓Approximately 4% word error rate on FLEURS benchmark, beating GPT-4o mini Transcribe, Gemini 2.5 Flash, AssemblyAI Universal, and Deepgram Nova per Mistral's published comparisons
✓Native multilingual support across 13 languages with strong non-English performance, not just English-first adaptation
✓Long-form support up to 3 hours per request reduces chunking overhead for meetings and podcasts

👎 Common Concerns

⚠Context biasing is optimized for English; support for other languages is labeled experimental
⚠With overlapping speech, the model typically transcribes only one speaker rather than separating concurrent voices
⚠Only 13 languages supported, fewer than competitors like Whisper (99+) or Deepgram for niche language coverage
⚠Realtime model is open-weights but Mini Transcribe V2 is API-only, limiting self-hosted batch workflows
⚠Documentation and tooling are newer than incumbents like AssemblyAI or Deepgram, so ecosystem integrations are still maturing