Full Professor, University of California, San Diego
3 papers at NeurIPS 2025
Transformer, Mamba, and RWKV language models show consistent patterns of change in behavior over the course of training
We identify several factors that lead to token premium effects in monolingual tokenizers and provide two interventions which significantly reduce tokenizer inequities.
We find bigram subnetworks in Transformer language models that are critical to model performance.