2 min learnNew DelhiApr 3, 2026 03:48 PM IST
Microsoft has launched MAI-Transcribe-1, its third in-house developed AI mannequin, which it claims is probably the most correct transcription mannequin on the planet.
With a median Phrase Error Charge of simply 3.9 per cent, MAI-Transcribe-1 works throughout 25 languages – English, French, German, Italian, Spanish, Hindi, Portuguese, Czech, Danish, Finnish, Hungarian, Dutch, Polish, Romanian, Swedish, Japanese, Korean, Chinese language, Arabic, Indonesian, Russian, Thai, Turkish, and Vietnamese.
Microsoft’s new AI mannequin ranks 1st within the FLUERS industry-standard benchmark in 11 core languages and surpasses the likes of Whisper-large-v3 on the 14 remaining languages. It additionally surpasses the lately launched Google Gemini 3.1 Flash in 11 out of 14 languages. Obtainable in Microsoft Foundry, the corporate says MAI-Transcribe-1’s batch transcription pace is 2.5x sooner than its Azure Quick providing and is obtainable for simply $0.36 per hour.
The corporate says MAI-Transcribe-1 is very correct in all supported languages, making it a perfect selection for a variety of speech-to-text use circumstances. Whereas it doesn’t help real-time transcription, Microsoft says it can add the function in a future model. Alongside MAI-Transcribe-1, Microsoft additionally launched two new AI fashions – MAI-Picture-2 and MAI-Voice-1, which, as their names counsel, can generate photographs and audio.
The tech big says MAI-Voice-1 is its flagship voice era mannequin that may “generate pure, sensible speech, wealthy with nuance, emotional vary and expression that preserves speaker id” even in long-form content material. Able to producing 60 seconds of audio in simply 1 second, MAI-Voice-1 can also be GPU-efficient. It’s out there in Copilot Audio Expressions and Copilot Podcasts.
As for MAI-Picture-2, Microsoft says it focuses on “efficiency and pace” and in addition appeared within the prime 3 mannequin household on the Enviornment.ai leaderboard. Whereas Microsoft’s AI fashions is probably not as massive or the quickest, the corporate hopes to promote them as cheaper options to massive language fashions from Google and OpenAI.


