2 papers across 1 session
We train RL agents directly from high-level specifications, without reward functions or domain-specific oracles.
We apply the EKFAC-preconditioner on Neumann series iterations to arrive at an unbiased iHVP approximation for TDA that improves influence function and unrolled differentiation performance.