2 papers across 2 sessions
We introduce CLEVER, a hand-curated benchmark for verified code generation in Lean. It requires full formal specs and proofs. No few-shot method solves all stages, making it a strong testbed for synthesis and formal reasoning.
A framework that synthesizes a tuple of optimal control policies for multi-agent systems that maximizes the probability of satisfying a desired hyperproperty.