4 papers across 3 sessions
EVOREFUSE: Evolutionary Prompt Optimization for Evaluation and Mitigation of LLM Over-Refusal to Pseudo-Malicious Instructions
We propose a novel bilevel system prompt optimization problem and a novel meta-learning framework to tackle it.