logo
today local_bar
Poster Session 1 · Wednesday, December 3, 2025 11:00 AM → 2:00 PM
#705

PaZO: Preconditioned Accelerated Zeroth-Order Optimization for Fine-Tuning LLMs

NeurIPS OpenReview

Abstract

This paper introduces PaZO, a preconditioned accelerated zeroth-order optimization algorithm for fine-tuning large language models (LLMs).
First, we theoretically demonstrate the necessity of preconditioning in zeroth-order optimization, proving that zeroth-order stochastic gradient descent (ZO-SGD) alone fails to achieve the ideal convergence rate. Building on this, we propose a Preconditioned Simultaneous Perturbation Stochastic Approximation (PSPSA) and theoretical version of PaZO, and demonstrate that setting the order of preconditioner as in PSPSA yields the improved convergence rate for PaZO.
Moreover, we design a practical version of PaZO that stabilizes training via diagonal Hessian estimate and moving average technique.
Extensive experiments on diverse downstream tasks with models like RoBERTa-large and OPT show PaZO’s effectiveness. Compared to other zeroth-order baselines, PaZO achieves better performance across models and tasks.