2 papers across 2 sessions
We introduce TRoVe, an automated approach for discovering error-inducing static feature biases learned by temporal VLMs.
We introduce SMMILE, the first multimodal medical benchmark for evaluating in-context learning abilities of vision-language models.