Full Professor, University of California Berkeley
3 papers at NeurIPS 2025
We introduce a method that allows adversarial optimization to be used in general-sum settings to train more robust and diverse policies.
We study the mechanism of chain of continuous thought on the graph reachability problem, and show it can reason by maintaining a superposition of multiple search traces both theoretically and empirically.