MoBA: Mixture of Block Attention for Long-Context LLMs
#3512 Spotlight · Enzhe Lu, Zhejun Jiang, Jingyuan Liu, Yulun Du, Tao Jiang, Chao Hong, Shaowei Liu, Weiran He, Enming Yuan, Yuzhi Wang, Zhiqi Huang, Huan Yuan, Suting Xu, Xinran Xu, Guokun Lai, Yanru Chen, Huabin Zheng, Junjie Yan, Jianlin Su, Yuxin Wu, Yutao Zhang, Zhilin Yang, Xinyu Zhou, Mingxing Zhang, Jiezhong Qiu
MoBA is a dynamic sparse attention mechanism for long-context LLMs. It uses the Mixture of Experts (MoE) principle to allow models to autonomously decide where to focus attention in training, without predefined biases.