PhD student, Emory University
3 papers at NeurIPS 2025
A novel semi-supervised learning paradigm that unifies view-wise co-training, meta-learned supervision, and adversarial perturbation through a structured triadic game.
We propose GPAS, a simple method that scales activations without scaling gradients to accelerate pretraining convergence of LLMs.
In this paper, we introduce the Curse of Depth, a concept that re-introduces, explains, and addresses the recent observation in modern Large Language Models (LLMs) where deeper layers are much less effective than expected.