6 papers across 3 sessions
We study the stochastic shortest path problem with sparse adversarial costs and under known transitions characterize the minimax regret achieved by OMD with a novel $\ell_r$-norm regularizer with $r \in [1,2]$.
We propose a global pruning framework that efficiently learns unstructured sparsity for LLMs.
FPSAttention is a training-aware FP8 quantization and sparsity co-design for video diffusion models that achieves up to 4.96× speedup without quality loss by aligning 3D tile granularity, denoising-step adaptation, and hardware-efficient kernels.
We investigate new scaling laws which predict the scaling of LLMs when training them over quantized or sparse representations.