1 paper across 1 session
We propose a parallel generation method for LLMs, where multiple instances synchronize through a shared, dynamically-updated attention cache