1 paper across 1 session
DRG-Sapphire, an LLM trained with GRPO, achieves SOTA accuracy in DRG coding and demonstrates a logarithmic scaling relationship between SFT examples and RL performance.