1 paper across 1 session
A framework and benchmark to evaluate language models' reasoning on imperfect tabular data