5 papers across 3 sessions
We propose an Interactive Text-to-3D Scene Retrieval Method to handle inherent query limitations.
We propose CellCLIP, a multimodal contrastive learning framework for Cell Painting data. It aligns image profiles and perturbation descriptions using pretrained encoders and a novel channel-wise scheme.
We introduce GARE, a gap-aware retrieval framework that learns pair-specific increments to alleviate optimization tension and false-negative noise in cross-modal alignment, achieving better uniformity and semantic structure.
We propose a novel Cross-Scene Spatial Reasoning and Grounding task, along with a new baseline and a benchmark dataset.