Assistant Professor, National University of Singapore
4 papers at NeurIPS 2025
We analyze the flow of tokens across attention layers and use these insights to enhance performance of Transformers.
We propose a tree-sliced framework to the Partial Transport setting by observing that the Partial Transport problem on tree-metric spaces can be reformulated as a standard Optimal Transport problem.
This paper introduces Angular Steering, a robust and generalized method for fine-grained behavior control in language models, unifying and extending existing steering techniques through rotation in a feature-isolating subspace.
We investigate Linear Mode Connectivity (LMC) in Mixture-of-Experts (MoE) architectures by analyzing their underlying permutation symmetries and proposing expert-matching algorithms that align independently trained MoEs to reveal LMC.