1 paper across 1 session
We conduct a user study to evaluate how well language models help humans internalize their reasoning, revealing that strong model performance alone doesn't guarantee effective reasoning transfer.