2 papers across 2 sessions
We conduct a user study to evaluate how well language models help humans internalize their reasoning, revealing that strong model performance alone doesn't guarantee effective reasoning transfer.