MS student, University of California, Berkeley
1 paper at NeurIPS 2025
We introduce a framework for evaluating & improving LLM consistency in simulated human dialogue. Our metrics correlate with human judgments and when used with multi-turn RL, reduce inconsistency across chit-chat, teaching and mental health dialogue.