1 paper across 1 session
We propose a training-free visual cropping method that leverages MLLM-internal representations for VQA tasks focusing on small details, achieving strong performance with significantly higher efficiency than prior methods.