PhD student, University of Illinois Urbana Champaign
1 paper at NeurIPS 2025
DRG-Sapphire, an LLM trained with GRPO, achieves SOTA accuracy in DRG coding and demonstrates a logarithmic scaling relationship between SFT examples and RL performance.