Understanding and Improving Fast Adversarial Training against $l_{0}$ Bounded Perturbations

adversarial robustness adversarial training

Abstract

This work studies fast adversarial training against sparse adversarial perturbations bounded by

l_{0}

norm.

We first demonstrate the unique challenges of employing

1

-step attacks on

l_{0}

bounded perturbations, especially catastrophic overfitting (CO) that cannnot be properly addressed by existing fast adversarial training method for other

l_{p}

norms (

p \geq 1

). We highlight that CO in

l_{0}

adversarial training arises from sub-optimal perturbation locations of

1

-step attack. Some strategies like multi-

ϵ

can mitigate this sub-optimality to some extent, they lead to unstable training in turn. Theoretical and numerical analyses also reveal that the loss landscape of

l_{0}

adversarial training is more craggy than its

l_{\infty}

l_{2}

and

l_{1}

counterparts, which exaggerates CO.

To address this issue, we adopt soft labels and the trade-off loss function to smooth the adversarial loss landscape. Extensive experiments demonstrate our method can overcome the challenge of CO, achieve state-of-the-art performance, and narrow the performance gap between

1

-step and multi-step adversarial training against sparse attacks.