Systematic Generalization

3 papers across 2 sessions

Poster Session 3

1 paper

Thursday, December 4, 2025 · 11:00 AM → 2:00 PM

Exhibit Hall C,D,E

Beyond Single-Task: Robust Multi-Task Length Generalization for LLMs

#5319 · Yi Hu, Shijia Kang, Haotong Yang, Haotian Xu, Muhan Zhang

Poster Session 5

2 papers

Friday, December 5, 2025 · 11:00 AM → 2:00 PM

Exhibit Hall C,D,E

SAGE-Eval: Evaluating LLMs for Systematic Generalizations of Safety Facts

#1104 Spotlight · Yueh-Han Chen, Guy Davidson, Brenden Lake

SAGE‑Eval is the first benchmark to test whether frontier LLMs robustly generalize critical safety knowledge to novel situations, and we show that the strongest model we tested only passed 58% of safety facts evaluated.

When No Paths Lead to Rome: Benchmarking Systematic Neural Relational Reasoning

#5016 · Anirban Das, Muhammad Irtaza Khalid, Rafael Peñaloza, Steven Schockaert

Reasoning models can learn rules from simple examples and be able to solve complex ones using the rules. We identify a broad class of everyday reasoning rules that current models cannot learn and build large datasets requiring such rule learning.