1 paper across 1 session
We propose joint recall, a novel synthetic task, and hybrid sparse attention with context-dependent sparsity for better sub-quadratic long-context modeling.