Principal Researcher, Squirrel Ai Learning
5 papers at NeurIPS 2025
We embed personalized LLM Watermarks responsively into generated text with Sparse Autoencoders, with only inference-time sampling on black-box access LLMs and no extra training, and achieved high accuracy while preserving text quality.
This paper introduces CorrectBench, the first comprehensive benchmark for systematically evaluating self-correction mechanisms in LLMs.