1 paper across 1 session
To achieve personalization in LLMs, we leverage the user model to incorporate a curiosity-based intrinsic reward into multi-turn RLHF.