Full Professor, Shanghai Jiao Tong University
2 papers at NeurIPS 2025
We introduce an unsupervised method for post-training multi-modal large language models using implicit reward signals from majority voting based on GRPO.
PANTHER is a hybrid framework that combines self-supervised generative pretraining with lightweight discriminative modeling to enable real-time fraud detection in large-scale payment platforms, delivering a 38% improvement in online performance.