Daniel Soudry

Associate Professor, Technion - Israel Institute of Technology, Technion

6 papers at NeurIPS 2025

Homepage· OpenReview· Semantic Scholar· Google Scholar

Poster Session 1

2 papers

Wednesday, December 3, 2025 · 11:00 AM → 2:00 PM

Exhibit Hall C,D,E

Alias-Free ViT: Fractional Shift Invariance via Linear Attention

#5107 · Hagay Michaeli, Daniel Soudry

We introduce an alias‑free ViT that combines anti‑aliasing with linear cross‑covariance attention to achieve fractional shift invariance, delivering ~99% consistency to sub‑pixel shifts and stronger translation robustness with competitive accuracy.

Temperature is All You Need for Generalization in Langevin Dynamics and other Markov Processes

#3000 Spotlight · Itamar Harel, Yonathan Wolanowsky, Gal Vardi, Nathan Srebro, Daniel Soudry

We derive simple generalization bounds for Markov training processes at any time during training, and then apply them to training with Langevin dynamics to improve existing bounds.

Poster Session 3

2 papers

Thursday, December 4, 2025 · 11:00 AM → 2:00 PM

Exhibit Hall C,D,E

FP4 All the Way: Fully Quantized Training of Large Language Models

#3410 Spotlight · Brian Chmiel, Maxim Fishman, Ron Banner, Daniel Soudry

We demonstrate for the first time, fully quantized training of a 7B LLM using FP4 format.

Are Greedy Task Orderings Better Than Random in Continual Linear Regression?

#2702 · Matan Tsipory, Ran Levinstein, Itay Evron, Mark Kong, Deanna Needell, Daniel Soudry

We study greedy task orderings in continual learning that maximize dissimilarity between consecutive tasks, and compare their performance to random orderings both analytically and empirically.

Poster Session 5

2 papers

Friday, December 5, 2025 · 11:00 AM → 2:00 PM

Exhibit Hall C,D,E

Optimal Rates in Continual Linear Regression via Increasing Regularization

#706 · Ran Levinstein, Amit Attia, Matan Schliserman, Uri Sherman, Daniel Soudry, Tomer Koren, Itay Evron

We prove that using regularization with either fixed or increasing strength yields near-optimal and optimal worst-case expected loss rates in realizable continual regression under random task orderings.

Tensor-Parallelism with Partially Synchronized Activations

#3415 · Itay Lamprecht, Asaf Karnieli, Yair Hanani, Niv Giladi, Daniel Soudry

Training LLM’s with tensor-parallelism without completely synchronizing activations to accelerate training and inference.