1 paper across 1 session
REN extracts object-centric region tokens from frozen vision features using point prompts—no segmentation needed. It’s 60× faster and 35× lighter than SAM, with strong performance across tasks.