4 papers across 3 sessions
We detect and remove backdoor samples in MLLM fine-tuning by identifying abnormal attention entropy patterns without requiring clean data or model modifications.
This paper proposes LoSplit, a training-time defense that detects and mitigates graph backdoors by dynamically analyzing early loss divergence.