1 paper across 1 session
We introduce CARES, a 18K-prompt benchmark for evaluating medical safety of LLMs under adversarial conditions, with graded harms, jailbreaks, and a fine-grained response metric.