Associate Professor, The Hong Kong Polytechnic University
3 papers at NeurIPS 2025
A novel MoE architecture that extends mixture-of-experts to both attention and feed-forward layers with unified expert designs and attention-FFN parameter sharing.