PhD student, Hong Kong Polytechnic University
3 papers at NeurIPS 2025
We propose GPAS, a simple method that scales activations without scaling gradients to accelerate pretraining convergence of LLMs.
In this paper, we introduce the Curse of Depth, a concept that re-introduces, explains, and addresses the recent observation in modern Large Language Models (LLMs) where deeper layers are much less effective than expected.
We propose Adaptive Classifier-Free Guidance (A-CFG), which dynamically re-masks low-confidence tokens for a more targeted unconditional input in iterative models.