PhD student, New York University
1 paper at NeurIPS 2025
We investigate how to scale second-order optimizers effectively, showing they outperform Adam and reduce data needs in compute-optimal transformer training.