1 paper across 1 session
Quantile-Guided Alignment (QA) is a framework for multi-dimensional quantile alignment that reduces catastrophic risks in language models by imposing constraints on reward quantiles across multiple performance dimensions.