Associate Professor, Nanyang Technological University
3 papers at NeurIPS 2025
Our proposed orthogonality and variance losses improve performance in downstream fine-tuning of Mixture-of-Experts models by enhancing expert specificity, addressing expert homogenization caused by load balancing, while maintaining load balance.
A Dynamic and Scalable Reasoning Framework for Solving RPMs