Researcher, LG Corporation
1 paper at NeurIPS 2025
Instance-level adaptive KL penalty control method for Direct Preference Optimization