Zhenbang Wu

PhD student, University of Illinois Urbana Champaign

1 paper at NeurIPS 2025

OpenReview· Semantic Scholar· Google Scholar

Poster Session 2

Wednesday, December 3, 2025 · 4:30 PM → 7:30 PM

Reinforcement Learning for Out-of-Distribution Reasoning in LLMs: An Empirical Study on Diagnosis-Related Group Coding

#1708 Spotlight · Hanyin Wang, Zhenbang Wu, Gururaj J. Kolar, Hariprasad Reddy Korsapati, Brian Bartlett, Bryan Hull, Jimeng Sun

DRG-Sapphire, an LLM trained with GRPO, achieves SOTA accuracy in DRG coding and demonstrates a logarithmic scaling relationship between SFT examples and RL performance.