Skip to main content
Audio models handle speech-to-text transcription and audio understanding.

OpenAI
Whisper Large V3 Turbo
whisper-large-v3-turbo
Capabilities: Speech-to-text transcription Strengths: Fast, accurate, multilingual Best for: Audio transcription, voice-to-text applicationsConfiguration repo: tinfoilsh/confidential-audio-processing
Audio Format: Supports .mp3 and .wav files

Mistral
Voxtral Small 24B
voxtral-small-24b
Parameters: 24BCapabilities: Speech-to-text transcription, audio Q&A, summarization, translation, voice-triggered function callingAudio Duration: Up to 30 minutes (transcription) or 40 minutes (understanding)Languages: English, Spanish, French, Portuguese, Hindi, German, Dutch, ItalianBest for: Speech transcription with automatic language detection, answering questions from spoken input, generating summaries from audio, and triggering functions from voice commandsConfiguration repo: tinfoilsh/confidential-voxtral-small-24b
Audio + Text: Built on Mistral Small 3.1 foundation, combining speech processing with strong text capabilities including function calling from voice commands.