moe - NeurIPS 2025

moe

5 papers across 3 sessions

Poster Session 1

3 papers

Wednesday, December 3, 2025 · 11:00 AM → 2:00 PM

Exhibit Hall C,D,E

Mozart: Modularized and Efficient MoE Training on 3.5D Wafer-Scale Chiplet Architectures

#811 Spotlight · Shuqing Luo, Ye Han, Pingzhi Li, Jiayin Qin, Jie Peng, Yang Zhao, Yu Cao, Tianlong Chen

We propose a framework for efficient MoE post-training on 3.5D Wafer-scale chiplets.

Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training

#1007 · Mengru Wang, Xingyu Chen, Yue Wang, Zhiwei He, Jiahao Xu, Tian Liang, Qiuzhi Liu, Yunzhi Yao, Wenxuan Wang, Ruotian Ma, Haitao Mi, Ningyu Zhang, Zhaopeng Tu, Xiaolong Li, Dong Yu

We introduce a novel inference-time steering methodology called Reinforcing Cognitive Experts (RICE), designed to improve reasoning depth and efficiency without additional training or complex heuristics.

UMoE: Unifying Attention and FFN with Shared Experts

#3401 Spotlight · Yuanhang Yang, Chaozheng Wang, Jing Li

A novel MoE architecture that extends mixture-of-experts to both attention and feed-forward layers with unified expert designs and attention-FFN parameter sharing.

Poster Session 2

1 paper

Wednesday, December 3, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

Dense Backpropagation Improves Training for Sparse Mixture-of-Experts

#3519 · Ashwinee Panda, Vatsal Baherwani, Zain Sarwar, Benjamin Thérien, Sambit Sahu, Tom Goldstein, Supriyo Chakraborty

Poster Session 5

1 paper

Friday, December 5, 2025 · 11:00 AM → 2:00 PM

Exhibit Hall C,D,E

MoESD: Unveil Speculative Decoding's Potential for Accelerating Sparse MoE

#3607 Spotlight · Zongle Huang, Lei Zhu, ZongYuan Zhan, Ting Hu, Weikai Mao, Xianzhi Yu, Yongpan Liu, Tianyu Zhang