ISTA - NeurIPS 2025

🏛 ISTA

3 papers across 3 sessions

Poster Session 1

Wednesday, December 3, 2025 · 11:00 AM → 2:00 PM

Head Pursuit: Probing Attention Specialization in Multimodal Transformers

#1013 Spotlight · Lorenzo Basile, Valentino Maiorca, Diego Doimo, Francesco Locatello, Alberto Cazzaniga

Attention heads in text-generative models specialize in semantic and visual concepts. Leveraging this property, we can reliably suppress or enhance specific attributes in both language and vision-language tasks.

Poster Session 3

1 paper

Thursday, December 4, 2025 · 11:00 AM → 2:00 PM

Exhibit Hall C,D,E

Quartet: Native FP4 Training Can Be Optimal for Large Language Models

#3406 · Roberto Castro, Andrei Panferov, Rush Tabesh, Oliver Sieberling, Jiale Chen, Mahdi Nikdan, Saleh Ashkboos, Dan Alistarh

We provide a method for accurate end-to-end FP4 training of Large Language Models.

Poster Session 6

1 paper

Friday, December 5, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

Attention with Trained Embeddings Provably Selects Important Tokens

#3909 · Diyuan Wu, Aleksandr Shevchenko, Samet Oymak, Marco Mondelli

We characterize the structure of embeddings obtained via gradient descent, showing that the attention mechanism provably selects important tokens.