PhD student, National University of Singapore
3 papers at NeurIPS 2025
We analyze the flow of tokens across attention layers and use these insights to enhance performance of Transformers.
We propose a tree-sliced framework to the Partial Transport setting by observing that the Partial Transport problem on tree-metric spaces can be reformulated as a standard Optimal Transport problem.
We investigate Linear Mode Connectivity (LMC) in Mixture-of-Experts (MoE) architectures by analyzing their underlying permutation symmetries and proposing expert-matching algorithms that align independently trained MoEs to reveal LMC.