2 papers across 2 sessions
We propose PolyJuice, a black box red teaming method that steers text-to-image generative models to generate images that deceive a synthetic image detector.
We argue that conclusions drawn about relative system safety or attack method efficacy via AI red teaming are often not supported by evidence provided by attack success rate (ASR) comparisons.