Assistant Professor, University of Edinburgh
4 papers at NeurIPS 2025
We integrate discrete diffusion models with neurosymbolic predictors for scalable and calibrated learning and reasoning
Inference-time hyper-scaling uses key–value cache compression with Delayed Memory Sparsification (DMS) to boost Transformer LLM reasoning accuracy for equivalent compute or memory costs.
We introduce a new method for principled, effective distillation across tokenizers, enabling a number of new applications.