Instructor, Yandex
1 paper at NeurIPS 2025
We propose a parallel generation method for LLMs, where multiple instances synchronize through a shared, dynamically-updated attention cache