PhD student, Institute of Science Tokyo
1 paper at NeurIPS 2025
A novel MoE architecture that extends mixture-of-experts to both attention and feed-forward layers with unified expert designs and attention-FFN parameter sharing.