Assistant Professor, Georgia Institute of Technology
3 papers at NeurIPS 2025
M-Pilot uses a small, controllable language model to guide a large, complex language model through complex tasks, improving its reasoning, planning, and personalization capability.
We introduce MLE-Dojo, a Gym-style framework for systematically reinforcement learning, evaluating, and improving autonomous large language model (LLM) agents in iterative machine learning engineering (MLE) workflows.
We propose Think-RM, a training framework for generative reward models that enables long-horizon reasoning, and introduce a pairwise RLHF pipeline that directly optimizes policies using pairwise preference rewards.