2 papers across 2 sessions
This paper proposes Panacea, a post-fine-tuning method that mitigates harmful fine-tuning in large language models, maintaining safety alignment without sacrificing performance across different tasks and models.