1 paper across 1 session
We propose MK-CAViT, a multi-kernel Vision Transformer with HGR-based correlation attention, achieving efficient multi-scale feature learning.