Researcher, Google
1 paper at NeurIPS 2025
To achieve personalization in LLMs, we leverage the user model to incorporate a curiosity-based intrinsic reward into multi-turn RLHF.