Assistant Professor, University of Washington
6 papers at NeurIPS 2025
We develop a variance-aware gap-dependent regret bound with better $H$ dependence for tabular MDPs.
We provide a computational efficient algorithm to achieve $O(H)$ deployment cost with polynomial sample complexity.
A new minimalist example to understand the Edge of Stability and Progressive Sharpening phenomenon
We build a benchmark on attribute-focused text-to-image retrieval and propose a pipeline of using promptable image embeddings for solving it, leading to performance gain.
We only need one example for RLVR on LLMs to achieve significant improvement on math tasks
We theoretically analyze the benefit of filtering a noisy training dataset on model performance in multimodal contrastive learning, and identify two regimes with different amounts of gain.