1 paper across 1 session
Unified theory LLM Alignment framework as divergence estimation between aligned and unaligned distributions; introduce KLDO, an alignment method, and advocate for compliance-refusal datasets to improve safety.