PhD student, Mila - Quebec AI Institute, Université de Montréal
4 papers at NeurIPS 2025
We stabilize gradients for training increasingly deep reinforcement learning agents by using a second-order optimizer and residual connections
Measuring neuronal activity via activations is ineffective in complex agents, as these values do not reflect true learning capacity. We introduce GraMa, which offers robust quantification and resetting guidance across various network architectures.
We improve the speed and performance of LLM post-training via a new asynchronous RL approach, leveraging an off-policy objective, replay buffer, and sampling strategies.