Skip to main content
Audio models handle speech-to-text transcription and audio understanding.

Mistral
Voxtral Small 24B
voxtral-small-24b
Parameters: 24BCapabilities: Speech-to-text transcription, audio Q&A, summarization, translation, voice-triggered function callingAudio Duration: Up to 30 minutes (transcription) or 40 minutes (understanding)Audio Format: Supports .mp3 and .wav filesLanguages: English, Spanish, French, Portuguese, Hindi, German, Dutch, ItalianBest for: Speech transcription with automatic language detection, answering questions from spoken input, generating summaries from audio, and triggering functions from voice commandsConfiguration repo: tinfoilsh/confidential-voxtral-small-24b
Audio + Text: Built on Mistral Small 3.1 foundation, combining speech processing with strong text capabilities including function calling from voice commands.