Full Professor, The University of Tokyo
3 papers at NeurIPS 2025
We propose the first provably efficient and episode-wise safe RL algorithm for linear constrained MDPs.
This study reveals that reasoning models achieve superior performance by forming distinct reasoning graphs, characterized by greater cyclicity, larger diameters, and strong small-world properties.