Assistant Professor, New York University
2 papers at NeurIPS 2025
We investigate how to scale second-order optimizers effectively, showing they outperform Adam and reduce data needs in compute-optimal transformer training.