Lab Leader, Language Lab, LG AI Research
1 paper at NeurIPS 2025
Instance-level adaptive KL penalty control method for Direct Preference Optimization