3 papers across 3 sessions
This work introduces BlockRank, a method designed to make LLMs efficient and scalable for in-context ranking by aligning their internal attention and inference mechanisms with the structure of the in-context ranking task.
An efficient fine-tuning framework that adapts a pre-trained weight matrix by decomposing its update into two components: (1) a projection onto the complement of a low-rank subspace spanned by a low-rank matrix, and (2) a low-rank update.