PhD student, Department of Computer Science, University of Toronto
1 paper at NeurIPS 2025
We train RL agents directly from high-level specifications, without reward functions or domain-specific oracles.