5 papers across 3 sessions
We propose RF-Agent, an automated RL reward function design framework via language agent tree search.
This work improves Monte Carlo Tree Search for symbolic regression through an extreme bandit strategy and evolution-inspired state-jumping actions.
We build a Monte Carlo Tree over the diffusion denoising process that can be used for scalable, compute-efficient, inference‑time alignment of pretrained diffusion models to new reward functions