1 paper across 1 session
We analyze what kind of LLMs have large improvement in RL finetuning and propose behavior injection augmentation to prepare the LLMs for RL.