4 min learnNew DelhiUp to date: Feb 8, 2026 11:57 AM IST
Indian AI start-up Sarvam has launched a brand new model of its text-to-speech AI mannequin with enhancements in pure speech technology throughout Indian areas, scripts, and accents.
The brand new mannequin, known as Bulbul V3, provides greater than 35 high-quality voices sourced from skilled voice artists with help for over 11 Indian languages, Sarvam stated in a weblog put up on Thursday, February 5. The corporate plans to increase help for all 22 scheduled Indian languages within the close to future.
Bulbul V3 is constructed on high of a big language mannequin (LLM) that analyses textual content and converts it into AI-generated speech with prosodic parts resembling pauses, emphasis, pacing, and tone modulation, making the output sound extra pure. In low-latency streaming output mode, the AI mannequin lets customers generate and play again audio in real-time.
“That is crucial for conversational functions, reside interactions, and any expertise the place responsiveness immediately impacts person engagement,” Sarvam stated. “Indian speech is complicated by default. Folks change languages mid-sentence. Accents fluctuate by area. Names, abbreviations, and feelings matter as a lot as phrases. To work in India, voice has to deal with all of this with out breaking,” the start-up added.
The AI mannequin additionally lets customers clone and create customized AI-generated voices. The consent-based, voice cloning characteristic comes with built-in safeguards and is designed for high-volume enterprise use instances, as per Sarvam.
Bulbul V3 is Sarvam’s newest AI mannequin launched as a part of a deliberate 14-day rollout of AI instruments, with one new launch every day, within the run as much as the extensively anticipated India-AI Impression Summit 2026 to be held in New Delhi later this month. Sarvam can be one among 12 start-ups and entities that has been chosen by the Indian authorities to develop sovereign LLMs beneath the Rs 10,300-crore India AI Mission. These indigenously developed AI fashions are anticipated to be unveiled on the Summit, which will likely be carried out from February 16 to February 20, 2026.
For these seeking to experiment with the brand new mannequin, Bulbul V3 might be accessed through the Sarvam Dashboard. The corporate can be providing builders limitless API entry to the brand new AI voice-generation mannequin up until February 28, 2026.
Story continues under this advert
Mannequin testing and efficiency
As a part of its testing, Sarvam stated that Bulbul V3 was evaluated by an impartial third-party in a blind A/B human listening examine throughout 11 languages. The check concerned evaluating paired audio samples generated by Bulbul V3 and rivals’ speech fashions utilizing equivalent enter textual content.
Whereas ElevenLabs v3 alpha topped the listing for audio high quality, Bulbul V3 outperformed Cartesia Sonic-3 and different rival fashions normally (full-band) evaluations, Sarvam stated. The corporate additional claimed that its new AI mannequin beat all different fashions in 8 kHz (telephony) evaluations.
Bulbul V3 additionally confirmed “the bottom charges of phrase skips and mispronunciations, whereas sustaining comparable efficiency on extra-content errors,” Sarvam stated.
Different releases
Right here’s a listing of AI fashions and instruments launched by Sarvam in current days:
Story continues under this advert
Sarvam Imaginative and prescient: It’s a 3 billion-parameter vision-language mannequin able to a variety of visible understanding duties, together with picture captioning, scene textual content recognition, chart interpretation, and sophisticated desk parsing.
Sarvam Samvaad: Conversational AI brokers that may be built-in with clients’ enterprise instruments to be able to take motion and ship insights based mostly on proprietary knowledge.
Sarvam Audio: It’s an audio extension of Sarvam 3B, a 3-billion-parameter language mannequin pre-trained on English and 22 Indian languages.
Sarvam Dub: It’s an AI dubbing mannequin with zero-shot voice cloning, exact timing management, and powered by cross-lingual speech fashions that permits creators to dub podcasts, academic programs, and so on in a number of Indian languages.


