1 paper across 1 session
We train RL agents directly from high-level specifications, without reward functions or domain-specific oracles.