Is Mamba the End of ChatGPT As We Know It?
Author(s): Ignacio de Gregorio
Originally published on Towards AI.
The Great New Question
Two researchers have made the boldest claim in years: throwing the biggest algorithmic breakthrough of the 21st century out the window.
Named Mamba, it achieves what was once thought impossible: matching or beating the Transformerβs language modeling capabilities while being faster and a lot cheaper.
Everyone seems to be talking about it, so letβs uncover what Mamba is.
This insight and more I share in Medium have previously been shared in my weekly newsletter, TheTechOasis.
If you want to be up-to-date with the frenetic world of AI while also feeling inspired to take action or, at the very least, to be well-prepared for the future ahead of us, this is for you.
U+1F3DDSubscribe belowU+1F3DD
The newsletter to stay ahead of the curve in AI
thetechoasis.beehiiv.com
Since its release in 2017, the Transformer architecture has become the βde factoβ choice for natural language modeling (models that generate text).
ChatGPT, Gemini, Claude, you name it, all are based on this seminal architecture.
The intrusiveness of this architecture is such that the βTβ in ChatGPT stands for βTransformerβ.
A sequence-to-sequence model, (takes a sequence as input, be that a text passage or a sequence of pixels in an image, and gives you another sequence, usually new text) the secret sauce of the Transformer is… Read the full blog for free on Medium.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI