6 papers across 3 sessions
We introduce a novel, scalable framework to evaluate compositional generalization, leverage it to evaluate more than 5k models, and propose a family of neural models pushing the Pareto frontier on this task.
We propose CELEBI, a self-supervised communication game that promotes compositionality via three novel mechanisms for modulating expressivity and efficiency.
Scaling neural networks leads to compositional generalization if the training distribution sufficiently covers the task space.
Reasoning models can learn rules from simple examples and be able to solve complex ones using the rules. We identify a broad class of everyday reasoning rules that current models cannot learn and build large datasets requiring such rule learning.
We introduce Ineq-Comp, a benchmark for testing compositional reasoning in formal inequality proving. Simple human-intuitive transformations cause major accuracy drops, showing that current LLM provers lack robust compositional generalization.