Associate Professor, Michigan State University
2 papers at NeurIPS 2025
We propose PolyJuice, a black box red teaming method that steers text-to-image generative models to generate images that deceive a synthetic image detector.
We introduce Obliviator, a nonlinear concept erasure method that guards against nonlinear adversaries. For any level of unwanted attribute protection, our method achieves higher task performance, revealing an empirical upper bound on this trade-off.