Assistant Professor, Carnegie Mellon University
3 papers at NeurIPS 2025
We introduce a principled framework for validating LLM-as-a-judge systems under rating indeterminacy, where multiple ratings can be "correct."
We show that even exact unlearning, the gold standard for data removal in large language models, can leak sensitive information by using guidance between pre- and post-unlearning checkpoints.