3 papers across 2 sessions
We introduce Weaver, a framework that combines multiple weak verifiers to effectively select responses in repeated sampling, achieving frontier model accuracy without supervised fine-tuning, while reducing verification costs by 99.97%.
Automatically detecting task-specific important tokens to accelerate speculative decoding
We propose grafting, a simple approach to materialize new architectures by editing pretrained diffusion transformers. It enables architectural exploration under small compute budgets.