MS student, Yonsei University
1 paper at NeurIPS 2025
Instance-level adaptive KL penalty control method for Direct Preference Optimization