Poster Session 1 · Wednesday, December 3, 2025 11:00 AM → 2:00 PM
#705
PaZO: Preconditioned Accelerated Zeroth-Order Optimization for Fine-Tuning LLMs
Abstract
This paper introduces PaZO, a preconditioned accelerated zeroth-order optimization algorithm for fine-tuning large language models (LLMs).
First, we theoretically demonstrate the necessity of preconditioning in zeroth-order optimization, proving that zeroth-order stochastic gradient descent (ZO-SGD) alone fails to achieve the ideal convergence rate. Building on this, we propose a Preconditioned Simultaneous Perturbation Stochastic Approximation (PSPSA) and theoretical version of PaZO, and demonstrate that setting the order of preconditioner as in PSPSA yields the improved convergence rate for PaZO.
Moreover, we design a practical version of PaZO that stabilizes training via diagonal Hessian estimate and moving average technique.
Extensive experiments on diverse downstream tasks with models like RoBERTa-large and OPT show PaZO’s effectiveness. Compared to other zeroth-order baselines, PaZO achieves better performance across models and tasks.