Full Professor, shanxi university
2 papers at NeurIPS 2025
We introduce Risk-aware Direct Preference Optimization (Ra-DPO), a novel approach that incorporates risk-awareness by employing a class of nested risk measures.