Assistant Professor, The Hong Kong University of Science and Technology
3 papers at NeurIPS 2025
The First Large-Scale Multilingual and Multi-Ethnic Code-Switching Dataset
We propose GPAS, a simple method that scales activations without scaling gradients to accelerate pretraining convergence of LLMs.