Postdoc, ETHZ - ETH Zurich
1 paper at NeurIPS 2025
We show RLHF is vulnerable to strategic manipulation, discuss trade-offs between incentive and policy alignment, and propose an approximately strategyproof algorithm to address it.