Speech Recognition🔴Developer

FunASR

Name: FunASR
Brand: FunASR
Availability: InStock

Industrial-grade open-source speech recognition toolkit from Alibaba — 170x realtime, 50+ languages, OpenAI-compatible API.

Starting at$0

Visit FunASR →

💡

In Plain English

Industrial-grade open-source speech recognition toolkit from Alibaba — 170x realtime, 50+ languages, OpenAI-compatible API.

Overview

FunASR is the open-source speech toolkit from Alibaba's ModelScope team and one of the most production-credible alternatives to OpenAI Whisper in 2026. It bundles a family of in-house models — Paraformer for non-autoregressive ASR, SenseVoice for multilingual recognition with emotion and event detection, CAM++ for speaker verification, and FSMN-VAD for voice activity detection — into a single toolkit with a unified Python API and a self-hostable HTTP server. Headline numbers are aggressive: 170x realtime decoding on a modern GPU, 50+ languages, robust performance on Chinese and other Asian languages where Whisper has historically struggled, and built-in speaker diarisation, timestamping, punctuation and streaming. The server speaks an OpenAI-compatible transcription API, so teams can swap it in behind existing Whisper integrations with no client changes. FunASR has become the default ASR backbone for many Chinese-language voice agent stacks and is increasingly used worldwide by teams who want on-prem speech without paying per-minute cloud rates. It is Apache-licensed, ships pre-built Docker images for CPU and GPU inference, and integrates cleanly with the ModelScope hub for newer model releases.

🎨

Vibe Coding Friendly?

▼

Difficulty:intermediate

Suitability for vibe coding depends on your experience level and the specific use case.

Learn about Vibe Coding →

Was this helpful?

Key Features

Feature information is available on the official website.

View Features →

Pricing Plans

Open source

See Full Pricing →Free vs Paid →Is it worth it? →

Ready to get started with FunASR?

View Pricing Options →

Best Use Cases

🎯

On-prem speech recognition without per-minute cloud fees

⚡

Chinese and multilingual voice agent stacks

🔧

Meeting transcription with speaker diarisation

🚀

Whisper replacement behind existing OpenAI clients

Pros & Cons

✓ Pros

✓Apache 2.0 licensing — safe for commercial and on-prem deployment
✓OpenAI-compatible API means drop-in replacement for Whisper code paths
✓Best-in-class Chinese/multilingual recognition vs Whisper at similar compute
✓Built-in diarisation, timestamps, and punctuation remove a layer of post-processing

✗ Cons

✗Documentation is uneven — some pieces are Chinese-only
✗You take on operational burden of running GPU inference
✗ModelScope catalogue moves fast — version pinning matters
✗English-only audio may still prefer Whisper-large depending on use case

Frequently Asked Questions

How much does FunASR cost?+

FunASR pricing starts at $0. They offer a single pricing plan.

🦞

New to AI tools?

Read practical guides for choosing and using AI tools

Read Guides →

Get updates on FunASR and 370+ other AI tools

Weekly insights on the latest AI tools, features, and trends delivered to your inbox.

User Reviews

No reviews yet. Be the first to share your experience!

Quick Info

Try FunASR Today

Get started with FunASR and see if it's the right fit for your needs.

Get Started →

Need help choosing the right AI stack?

Take our 60-second quiz to get personalized tool recommendations

Find Your Perfect AI Stack →

Want a faster launch?

Explore 20 ready-to-deploy AI agent templates for sales, support, dev, research, and operations.

Browse Agent Templates →

More about FunASR

Pricing Review Alternatives Free vs Paid Pros & Cons Worth It?Tutorial

Overview

Pros & Cons

✓ Pros

✓Apache 2.0 licensing — safe for commercial and on-prem deployment
✓OpenAI-compatible API means drop-in replacement for Whisper code paths
✓Best-in-class Chinese/multilingual recognition vs Whisper at similar compute
✓Built-in diarisation, timestamps, and punctuation remove a layer of post-processing

✗ Cons

✗Documentation is uneven — some pieces are Chinese-only
✗You take on operational burden of running GPU inference
✗ModelScope catalogue moves fast — version pinning matters
✗English-only audio may still prefer Whisper-large depending on use case