What is Microsoft's MAI-1 Model?

Speculating on the future of in-house Microsoft AI models.

May 07, 2024

∙ Paid

Microsoft Hires DeepMind Co-Founder to Lead Consumer AI Unit - WSJ

Hello Everyone,

While the world waits for Llama-3’s biggest model and its capabilities, there is another model we might be waiting for.

Microsoft’s new Microsoft AI team is working on a new LLM that’s fairly big called MAI-1. As the cost of developing the best models increases, Microsoft is well positioned. The story was first mentioned by The Information.

Of course as usual detail are scarce.

The language model will be separate from OpenAI's GPT-4 and will be overseen by Mustafa Suleyman.
It’s said to be 500 Billion parametres and likely around the equivalent of GPT-4 Turbo, if not better.
It is separate from what Inflection AI has done in the past, but will be a default for the new Microsoft AI team made up for various former Inflection AI talent, among others.

What is the Purpose of the MAI-1 LLM?

It's the first stand-alone AI model the software giant is building since it poured $10 billion into OpenAI for rights to power its generative AI tools like Copilot with GPT-4, which underlies ChatGPT. It’s also not exactly clear why Microsoft thought this would be important.

Straightforward comments on Microsoft’s AI R&D from Microsoft CTO, Kevin Scott. Brought to us by Eric Horwitz might help us shed light on this:

“I'm not sure why this is news, but just to summarize the obvious: we build big supercomputers to train AI models; our partner Open AI uses these supercomputers to train frontier-defining models; and then we both make these models available in products and services so that lots of people can benefit from them. We rather like this arrangement. 😁 We've been at it for almost five years now. Each supercomputer we build for Open AI is a lot bigger than the one that preceded it, and each frontier model they train is a lot more powerful than its predecessors. We will continue to be on this path--building increasingly powerful supercomputer for Open AI to train the models that will set pace for the whole field--well into the future. There's no end in sight to the increasing impact that our work together will have.

We also, for years and years and years, have built AI models in MSR and in our product groups. AI models turn out to be interesting things to work on, and our researchers do great work studying and building them. AI models are used in almost every one of our products, services, and operating processes at Microsoft, and the teams making and operating things on occasion need to do their own custom work, whether that's training a model from scratch, or fine tuning a model that someone else has built. There will be more of this in the future too. Some of these models have names like Turing, and MAI. Some, like Phi for instance, we even open source.

I know the way I've said it isn't all that dramatic, but it is reality. And it's a plenty exciting reality for all us geeks given how hard all of this is to do in practice.”

Keep reading with a 7-day free trial

Subscribe to The Nvidia Patterns to keep reading this post and get 7 days of free access to the full post archives.