2 papers across 2 sessions
OS agents are vulnerable to Malicious Image Patches (MIPs) embedded in screenshots, enabling a novel attack that poses significant security risks.
This paper introduces the first unified platform for evaluating text-to-video and video-to-text models across five key dimensions: safety, hallucination, fairness, privacy, and adversarial robustness.