1 paper across 1 session
We propose a cognitive-science-inspired framework and benchmark to systematically evaluate learning abilities of large language models.