Principal Researcher, RIKEN
1 paper at NeurIPS 2025
A lightweight approach that adaptively patches the input image to increase token information density and encode hierarchical spatial structures into the input embedding.