Member of Technical Staff, Cohere
1 paper at NeurIPS 2025
While loss decreases monotonically during LLM training, the representations undergo distinct geometric phases across pretraining and post-training, which in turn determine when and how the model acquires memorization or generalization capabilities.