LLM finetuning

2 papers across 2 sessions

Poster Session 2

1 paper

Wednesday, December 3, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

Shape it Up! Restoring LLM Safety during Finetuning

#1302 · ShengYun Peng, Pin-Yu Chen, Jianfeng Chi, Seongmin Lee, Duen Horng Chau

Poster Session 4

1 paper

Thursday, December 4, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards

#214 · Charles Arnal, Gaëtan Narozniak, Vivien Cabannes, Yunhao Tang, Julia Kempe, Remi Munos

In the context of off-policy RL, we give a theoretical analysis of the role of an additive reward correction in improving performance, accompanied by experiments on bandits and LLM posttraining.