Microsoft's New AI Models Go Beyond Just Text

Microsoft is doubling down on AI models that aren’t large language models. The company announced on Thursday that it’s releasing three new models: brand new models for voice and text transcription, and the second generation of its in-house image model.

The voice and text transcription models are the first of their kind from Microsoft. The transcription model can translate recordings into text in 25 different languages. It’s built for video captioning, meeting transcription and voice agents. The voice model can create audio recordings up to 60 seconds long. The company says its second-generation image model has a faster generation speed and more lifelike depictions, improving on its previous model. They’re available now in Microsoft’s Foundry and MAI playground, with future plans to bring MAI-Image-2 to Bing and PowerPoint. Developers can check out pricing info here.

These new models are a clear sign that Microsoft is looking to expand its offerings across the AI market. Microsoft’s Copilot is one of the most popular chatbots for businesses, especially those who already use Microsoft’s Office 360 suite and Azure cloud service. Aside from the now-outdated original image model, Microsoft has primarily focused on text-based models, trying to distinguish itself among its many competitors as a secure, enterprise-friendly option. Its newest AI tools, Copilot Cowork and Copilot Health, are proof of that.

The models are also a reminder that Microsoft, as a legacy tech company, has the cash and compute to burn on these kinds of “side quests” that even billion-dollar start-ups like OpenAI can’t always afford to do. Last week, OpenAI confirmed that it will be discontinuing its Sora AI video app, citing that it will refocus on core activities. The AI industry in 2026 has been aiming to prove its tools are useful in the workplace, especially with Anthropic’s Claude Code leapfrogging the competition.

Generative media, like the models that power AI image and video generation, require a lot of compute and energy to run, which could be spent elsewhere. Google, as another legacy tech company with billions of its budget allocated to AI research, indicated this week that it won’t be giving up on generative media but will be trying to make models more cost- and energy-efficient, as with its new Veo 3.1 Lite video model.

Read the full article here

Trending Now

Amazon’s Big Spring Sale: The Price of the Echo Dot Max Speaker Has Never Been Lower

Even Artemis II Astronauts Have Microsoft Outlook Problems

Alexa Plus AI Can Order Food From Uber Eats and Grubhub, but Only With the Right Device

Microsoft’s New AI Models Go Beyond Just Text

Best LED Floodlights of 2026: Smart Lights for Big Spaces

I Tried a Space-Age, Self-Cleaning Air Fryer. Here’s My Take on the Typhur Dome 2

Robots Could Help Humans Find Resources on the Moon and Mars

Apple’s Next M5 MacBook Pros Could Drop With MacOS 26.3

The Worst-Value Home Security System Picks and Why You Don’t Need Them

Best Portable Power Station Deals at Amazon’s Spring Sale 2026: Jackery, Anker and EcoFlow

Skip the TSA Line: Where to Find Travel by Bus, Train, and Boat

The Best Time to Drink Coffee for Productivity (and When Not To)

YouTube TV vs. Hulu Plus Live TV: Which Offers the Best Experience for Your Buck?

Review: Momentum Vida E+ Electric Bike

Gas Prices Are Soaring. So Is the Demand for Used EVs

Featured News

8K TVs Explained: Not Worth the Money

Ace Combat 7 Leads PS Plus February Lineup With Undisputed, Subnautica: Below Zero and Ultros

3 Great Heated Blanket Deals on My Favorite Affordable Models

Trending Now

Super Bowl 2026 Ads Are Buzzing Now: Sneak Peeks at ‘Scream 7,’ Michelob and More

Stop Overpaying for Home Security. This $60 Ring Deal Is the Perfect Way to Sleep Peacefully

Flashback One35 V2 Review: A Digital Camera That Thinks It’s a Disposable Film Camera

Trending Now

Microsoft’s New AI Models Go Beyond Just Text

Related Articles