Pluralistic alignment

5 papers across 3 sessions

Poster Session 1

Wednesday, December 3, 2025 · 11:00 AM → 2:00 PM

Strategyproof Reinforcement Learning from Human Feedback

#408 · Thomas Kleine Buening, Jiarui Gan, Debmalya Mandal, Marta Kwiatkowska

We show RLHF is vulnerable to strategic manipulation, discuss trade-offs between incentive and policy alignment, and propose an approximately strategyproof algorithm to address it.

Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)

#5313 · Liwei Jiang, Chai Yuanjun, Margaret Li, Mickel Liu, Raymond Fok, Nouha Dziri, Yulia Tsvetkov, Maarten Sap, Yejin Choi

Poster Session 2

2 papers

Wednesday, December 3, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

Direct Alignment with Heterogeneous Preferences

#1002 · Ali Shirali, Arash Nasr-Esfahany, Abdullah Alomar, Parsa Mirtaheri, Rediet Abebe, Ariel Procaccia

Embracing Contradiction: Theoretical Inconsistency Will Not Impede the Road of Building Responsible AI Systems

#1111 · Gordon Dai, Yunze Xiao

Theoretical inconsistencies among Responsible AI metrics aren't a problem but a benefit – they enable both pluralistic approaches to alignment that respect diverse values, help conceptual understanding, and create more robust, adaptable models.

Poster Session 4

1 paper

Thursday, December 4, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

Pairwise Calibrated Rewards for Pluralistic Alignment

#3301 · Daniel Halpern, Evi Micha, Ariel Procaccia, Itai Shapira