Clippers 10/17: Sam Stevens on mixture-of-experts (MoE) language models

In Clippers next week I will present some early-stage planning for a mixture-of-experts (MoE) language model project I hope to pursue. It will consist of:

  1. A literature review of neural MoE models in NLP
  2. How MoE models changed my thinking around model parallelism, FLOPs and compute efficiency
  3. What this implies about GPT-4 (which is rumored to be a MoE model)
  4. Soft MoE: a recent paper that aims to solve many problems with MoE models, but only applies it to vision
  5. Ideas I have on how to apply soft MoE to language modeling

I hope that #1 and #2 will be valuable to everyone, because I think MoE models are very under-utilized in research, despite supposedly powering the best language model in the world (GPT-4).