Inside AI: - AI21 Labs Jamba
In this video Mike talks to Yuval Belfer from AI21 Labs about LLM model architectures, Mamba, Mixture of Experts, and Jamba at AWS's gen AI loft space in San Fr
the Transformer architecture has been at the center of generative AI for the last several years for text generation but researchers of course have always been looking to see what’s going to come next how can we break through the barriers of Transformers and get even more intelligence even more performance at a cost of compute that’s achievable and some researchers came across or devised the Mamba architecture now Mamba architectures were super interesting they performed pretty well but they weren’t quite there an AI 21 lab saw this and combined together the Mamba architecture with Transformers and some mixture of experts as well and came up with a model that they called Jamba so I wanted to find out a lot more about Jamba and Mamba and why don’t we talk about some mixture of experts as well and I spoke to Yuval belur from AI 21 Labs here at the AWS generative AI Loft in San Francisco and I started off by just asking the question what is Jamba so Jamba is a novel architecture that interleaves layers of Transformer Mamba and mixture of experts in order to overcome the main problems of Transformer architecture which is speed and memory consumption okay I love this okay so in in the de
Chapters
- 0:00 - - Intro
- 1:02 - - What is Jamba?
- 1:26 - - What’s wrong with the Transformer architecture?
- 4:00 - - KV Cache
- 6:10 - - Mixture of Experts
- 10:41 - - RNN and Mamba
- 17:22 - - Jamba
- 23:55 - - Benchmarking
- 25:40 - - Public weights
- 27:20 - - Getting started with Jamba