7 papers across 3 sessions
We find that rerouting spurious shortcuts in adapter training enables robust disentanglement for text-to-image generation with adapters.
A new LLM jailbreak objective that enables more nuanced control over jailbroken responses, exploits undergeneralization of safety alignment, and improves success rates of existing jailbreaks from 14% to 80%.