Associate Professor, Technion - Israel Institute of Technology, Technion
6 papers at NeurIPS 2025
We introduce an alias‑free ViT that combines anti‑aliasing with linear cross‑covariance attention to achieve fractional shift invariance, delivering ~99% consistency to sub‑pixel shifts and stronger translation robustness with competitive accuracy.
We derive simple generalization bounds for Markov training processes at any time during training, and then apply them to training with Langevin dynamics to improve existing bounds.
We demonstrate for the first time, fully quantized training of a 7B LLM using FP4 format.
We study greedy task orderings in continual learning that maximize dissimilarity between consecutive tasks, and compare their performance to random orderings both analytically and empirically.
We prove that using regularization with either fixed or increasing strength yields near-optimal and optimal worst-case expected loss rates in realizable continual regression under random task orderings.
Training LLM’s with tensor-parallelism without completely synchronizing activations to accelerate training and inference.