Associate Professor, Peking University
3 papers at NeurIPS 2025
We embed personalized LLM Watermarks responsively into generated text with Sparse Autoencoders, with only inference-time sampling on black-box access LLMs and no extra training, and achieved high accuracy while preserving text quality.
This study introduces a novel causality-driven robust optimization approach that selectively updates model components sensitive to causal reasoning, enhancing model causality while preserving valuable pretrained knowledge to mitigate overfitting.