Real-time AI voice model with emotion control and voice cloning capabilities for creating expressive, studio-quality audio content.
Fish Speech is an open-source text-to-speech (TTS) platform developed by Fish Audio that delivers real-time voice synthesis with fine-grained emotion control and zero-shot voice cloning. Built on a dual autoregressive architecture (VQGAN + Llama), it supports over 13 languages including English, Mandarin, Japanese, Korean, French, German, Arabic, and Spanish, making it one of the most multilingual open-source TTS solutions available as of early 2026.
The platform allows users to clone a voice from as little as 10â15 seconds of reference audio, producing natural-sounding speech that preserves the tone, cadence, and stylistic qualities of the source. Emotion control is achieved through prompt engineering and reference audio selection, enabling users to generate speech with specific emotional inflections such as happiness, sadness, anger, or calm without retraining the model.
Fish Speech operates with inference latency under 150 milliseconds on consumer-grade GPUs (RTX 3060 and above), enabling real-time or near-real-time voice generation suitable for interactive applications like chatbots, virtual assistants, and live content creation. The model weights are released under the Apache 2.0 license for the base model, with a commercial-friendly CC-BY-NC-SA 4.0 license for certain fine-tuned checkpoints.
The Fish Audio cloud platform (fish.audio) provides a hosted API with managed infrastructure, removing the need for local GPU resources. The API supports streaming audio output, batch processing, and SSML-like markup for controlling pacing, pauses, and emphasis. As of version 1.5 (released Q1 2026), the model achieves a Mean Opinion Score (MOS) of approximately 4.1 out of 5.0 on standard speech naturalness benchmarks, competitive with proprietary solutions that cost significantly more.
Fish Speech has accumulated over 15,000 GitHub stars and an active open-source community contributing voice packs, language fine-tunes, and integration plugins for platforms such as Discord, OBS Studio, and Unity. The platform processes over 2 million API requests per month through its hosted service, serving creators, developers, and enterprises across gaming, audiobook production, accessibility tools, and customer service automation.
Was this helpful?
$0/month
$15/month
Custom pricing (contact sales)
Ready to get started with Fish Speech?
View Pricing Options âWeekly insights on the latest AI tools, features, and trends delivered to your inbox.
No reviews yet. Be the first to share your experience!
Get started with Fish Speech and see if it's the right fit for your needs.
Get Started âTake our 60-second quiz to get personalized tool recommendations
Find Your Perfect AI Stack âExplore 20 ready-to-deploy AI agent templates for sales, support, dev, research, and operations.
Browse Agent Templates â