Full Professor, University of Illinois at Urbana-Champaign
4 papers at NeurIPS 2025
The paper proposes a principled reward design framework for training LLMs on tool use via reinforcement learning, leading to significant gains over SFT and baseline models in generalization and performance.
We replace expensive language models with cheap neural networks to estimate the value of data, thereby saving significant computations costs while maintaining performance.
RL fine-tuning in LLMs updates a small subnetwork containing 20–30% of parameters leaving rest of the parameters unchanged.
We introduce MIRAGE, a benchmark for multimodal expert consultation in agriculture featuring single-turn and multi-turn tasks.