2 papers across 2 sessions
We provide a method for accurate end-to-end FP4 training of Large Language Models.
FALQON accelerates LoRA fine-tuning by up to 3$\times$ through merging adapters into an FP8-quantized backbone, removing redundant quantization overhead from small matrices.