1 paper across 1 session
We study the problem of computing an optimal large language model (LLM) policy for a constrained alignment problem.