PhD student, Queen Mary, University of London
1 paper at NeurIPS 2025
We show that penalizing certain CoT reasoning makes LLMs learn encoding schemes that generalize to unseen examples.