2 papers across 1 session
We show that penalizing certain CoT reasoning makes LLMs learn encoding schemes that generalize to unseen examples.