PhD student, Imperial College London
2 papers at NeurIPS 2025
We propose SRPO, a reflection-aware RL method that significantly improves multimodal LLM reasoning by explicitly teaching self-reflection, outperforming state-of-the-art models on multiple benchmarks.
NOVA is an extreme OOD stress-test dataset of ∼900 multi-modal brain MRI scans (with 281 rare pathologies) for benchmarking VLMs on three clinical tasks: anomaly localization, captioning, and diagnostic reasoning.