3 papers across 2 sessions
We study the problem of computing an optimal large language model (LLM) policy for a constrained alignment problem.