Microsoft is doubling down on AI models that aren’t large language models. The company announced on Thursday that it’s releasing three new models: brand new models for voice and text transcription, and the second generation of its in-house image model.
The voice and text transcription models are the first of their kind from Microsoft. The transcription model can translate recordings into text in 25 different languages. It’s built for video captioning, meeting transcription and voice agents. The voice model can create audio recordings up to 60 seconds long. The company says its second-generation image model has a faster generation speed and more lifelike depictions, improving on its previous model. They’re available now in Microsoft’s Foundry and MAI playground, with future plans to bring MAI-Image-2 to Bing and PowerPoint. Developers can check out pricing info here.
These new models are a clear sign that Microsoft is looking to expand its offerings across the AI market. Microsoft’s Copilot is one of the most popular chatbots for businesses, especially those who already use Microsoft’s Office 360 suite and Azure cloud service. Aside from the now-outdated original image model, Microsoft has primarily focused on text-based models, trying to distinguish itself among its many competitors as a secure, enterprise-friendly option. Its newest AI tools, Copilot Cowork and Copilot Health, are proof of that.
The models are also a reminder that Microsoft, as a legacy tech company, has the cash and compute to burn on these kinds of “side quests” that even billion-dollar start-ups like OpenAI can’t always afford to do. Last week, OpenAI confirmed that it will be discontinuing its Sora AI video app, citing that it will refocus on core activities. The AI industry in 2026 has been aiming to prove its tools are useful in the workplace, especially with Anthropic’s Claude Code leapfrogging the competition.
Generative media, like the models that power AI image and video generation, require a lot of compute and energy to run, which could be spent elsewhere. Google, as another legacy tech company with billions of its budget allocated to AI research, indicated this week that it won’t be giving up on generative media but will be trying to make models more cost- and energy-efficient, as with its new Veo 3.1 Lite video model.
Read the full article here
