Associate Professor, Mila - Quebec Artificial Intelligence Institute
2 papers at NeurIPS 2025
We present POSSM, a novel architecture that combines input cross-attention with a recurrent state-space model, achieving competitive accuracy, fast inference, and efficient generalization for real-time neural decoding applications.
While loss decreases monotonically during LLM training, the representations undergo distinct geometric phases across pretraining and post-training, which in turn determine when and how the model acquires memorization or generalization capabilities.