Intern, Tencent AI Lab
2 papers at NeurIPS 2025
Unsupervised Prefix Fine-Tuning Method for Reasoning Models
We introduce a RL framework to train LLM's reasoning and self-verification ability simultaneously.