5 papers across 3 sessions
We present a model-aware approach that leverages the model’s own signals to dynamically choose training data, markedly boosting both training and data efficiency in RL fine-tuning.