2 papers across 2 sessions
Unify supervised & reinforcement fine-tuning, and outperforms both of them. Together with theoretical justifications.
Hierachical Balance Packing, Efficient SFT