Adjunct Professor, Laval university
2 papers at NeurIPS 2025
We provide a statistically rigorous guidelines for training interactive, multi-step LLM agents, exploring optimal compute allocation, generalization, and hyperparameter settings.
This paper introduces an unsupervised method that disentangles interpretable latent concepts in language model activations that mediate behavior, assuming that sparse changes in these concepts can induce changes in model behavior.