4 papers across 3 sessions
We propose a parallel generation method for LLMs, where multiple instances synchronize through a shared, dynamically-updated attention cache
A report of the Big ANN competition in NeurIPS'23
Automatically detecting task-specific important tokens to accelerate speculative decoding
Alchemist: a compact (3.3k) SFT dataset via diffusion-model filtering. Boosts T2I aesthetics/complexity in 5 SD models (weights released) while keeping diversity.