Enhancing LLM Watermark Resilience Against Both Scrubbing and Spoofing Attacks

Huanming Shen, Baizhou Huang, Xiaojun Wan

Peking University· University of Electronic Science and Technology

machine learning security llm watermarking

Abstract

Watermarking is a promising defense against the misuse of large language models (LLMs), yet it remains vulnerable to scrubbing and spoofing attacks. This vulnerability stems from an inherent trade-off governed by watermark window size: smaller windows resist scrubbing better but are easier to reverse-engineer, enabling low-cost statistics-based spoofing attacks.

This work expands the trade-off boundary by introducing a novel mechanism, equivalent texture keys, where multiple tokens within a watermark window can independently support the detection.

Based on the redundancy, we propose a watermark scheme with Sub-vocabulary decomposed Equivalent tExture Key (SEEK). It achieves a Pareto improvement, increasing the resilience against scrubbing attacks without compromising robustness to spoofing.

Our code will be available at https://github.com/Hearum/SeekWM.