Thomas Kleine Buening

Postdoc, ETHZ - ETH Zurich

1 paper at NeurIPS 2025

Homepage· OpenReview· Semantic Scholar· Google Scholar

Poster Session 1

1 paper

Wednesday, December 3, 2025 · 11:00 AM → 2:00 PM

Exhibit Hall C,D,E

Strategyproof Reinforcement Learning from Human Feedback

#408 · Thomas Kleine Buening, Jiarui Gan, Debmalya Mandal, Marta Kwiatkowska

We show RLHF is vulnerable to strategic manipulation, discuss trade-offs between incentive and policy alignment, and propose an approximately strategyproof algorithm to address it.