PhD student, Columbia University
1 paper at NeurIPS 2025
We train reward models to encourage fairer step-by-step reasoning in LLMs, reducing bias on high-stakes decision-making tasks