3 papers across 3 sessions
Influence Distillation is a mathematically justified data selection method for LLM fine-tuning that assigns optimal weights to training samples, achieving performance on par with or better than state-of-the-art while being substantially faster.
We provide a method for accurate end-to-end FP4 training of Large Language Models.
We investigate new scaling laws which predict the scaling of LLMs when training them over quantized or sparse representations.