PaZO: Preconditioned Accelerated Zeroth-Order Optimization for Fine-Tuning LLMs

Hanzhen Zhao, Shihong Ding, Cong Fang, Zhouchen Lin

Peking University· Pazhou Laboratory, Guangzhou, Guangdong, China

zeroth-order optimization fine-tuning LLMs preconditioner

Abstract

This paper introduces PaZO, a preconditioned accelerated zeroth-order optimization algorithm for fine-tuning large language models (LLMs).

First, we theoretically demonstrate the necessity of preconditioning in zeroth-order optimization, proving that zeroth-order stochastic gradient descent (ZO-SGD) alone fails to achieve the ideal convergence rate. Building on this, we propose a Preconditioned Simultaneous Perturbation Stochastic Approximation (PSPSA) and theoretical version of PaZO, and demonstrate that setting the order of preconditioner as

- 1/2

in PSPSA yields the improved convergence rate for PaZO.

Moreover, we design a practical version of PaZO that stabilizes training via diagonal Hessian estimate and moving average technique.

Extensive experiments on diverse downstream tasks with models like RoBERTa-large and OPT show PaZO’s effectiveness. Compared to other zeroth-order baselines, PaZO achieves better performance across models and tasks.