Principal Researcher, Microsoft
1 paper at NeurIPS 2025
NaDRO employs unique preference and context-based rewards to effectively train LLMs on noisy data, enabling even smaller models to achieve superior performance in complex decision-making tasks.