5 papers across 3 sessions
We present ARGenSeg, a unified framework for multimodal understanding and pixel-level perception, achieving state-of-the-art performance of image segmentation.
we propose Seg-VAR, a novel framework that rethinks segmentation as a conditional autoregressive mask generation problem