William Merrill

Assistant Professor, Toyota Technological Institute at Chicago

3 papers at NeurIPS 2025

Homepage· OpenReview· Semantic Scholar· Google Scholar

Poster Session 1

1 paper

Wednesday, December 3, 2025 · 11:00 AM → 2:00 PM

Exhibit Hall C,D,E

A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers

#3901 · William Merrill, Ashish Sabharwal

We show that slightly increasing transformers' depth with the input length increases their expressive power under standard complexity conjectures.

Poster Session 2

2 papers

Wednesday, December 3, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training

#5412 Spotlight · William Merrill, Shane Arora, Dirk Groeneveld, Hannaneh Hajishirzi

We propose a simple way to measure the critical batch size for language model pretraining that alleviates issues with existing methods, and show that this can be used to train language models with fewer gradient steps in practice.

Exact Expressive Power of Transformers with Padding

#3906 · William Merrill, Ashish Sabharwal

We exactly characterize the expressive power of transformers with padding tokens as $\mathsf{TC}^0$, and we also characterize transformers with looping and padding.