Empathic voice AI — EVI 3 speech-to-speech model with real-time prosody understanding, Octave expressive TTS, and emotion/expression measurement APIs for voice, face, and video.
Empathic voice AI — EVI 3 speech-to-speech model with real-time prosody understanding, Octave expressive TTS, and emotion/expression measurement APIs for voice, face, and video.
Hume AI is a research-first lab focused on measuring and generating emotional expression in voice, face, and speech. Its flagship product, EVI (Empathic Voice Interface), is a speech-to-speech model: instead of stitching STT + LLM + TTS, EVI listens, reasons, and speaks end-to-end while reading the user's prosody (tone, sigh, laughter, hesitation) and adjusting its own delivery in response. EVI 3, the current generation, lets developers build voice agents that handle interruptions, sound human under emotional weight, and use any LLM as the reasoning backbone via tool calls. Octave is Hume's standalone expressive TTS model with controllable emotion, voice style, and acting direction. The Measurement APIs analyze audio, video, or face data for dozens of expressive dimensions and are used in research, market research, content moderation, and qualitative analytics.
Was this helpful?
Feature information is available on the official website.
View Features →$0
$3–$14/month
$70/month
$200–$500/month
Custom
Ready to get started with Hume AI?
View Pricing Options →Weekly insights on the latest AI tools, features, and trends delivered to your inbox.
No reviews yet. Be the first to share your experience!
Get started with Hume AI and see if it's the right fit for your needs.
Get Started →Take our 60-second quiz to get personalized tool recommendations
Find Your Perfect AI Stack →Explore 20 ready-to-deploy AI agent templates for sales, support, dev, research, and operations.
Browse Agent Templates →