CrossSpectra: Exploiting Cross-Layer Smoothness for Parameter-Efficient Fine-Tuning

Yifei Zhang, Hao Zhu, Junhao Dong, Haoran Shi, Ziqiao Meng, Piotr Koniusz, Han Yu

Northwestern Polytechnical University· Nanyang Technological University· National University of Singapore· CSIRO· University of New South Wales· Australian National University

Parameter-efficient fine-tuning (PEFT)

⋅ NeurIPS ⋅ Poster ⋅OpenReview

Abstract

Parameter-efficient fine-tuning (PEFT) is essential for adapting large foundation models without excessive storage cost. However, current approaches such as LoRA treat each layer’s adaptation independently, overlooking correlations across layers. This independence causes the number of trainable parameters to grow linearly with model depth.

We provide theoretical and empirical evidence that skip connections in transformers create smooth gradient propagation across layers. This smoothness leads to weight adaptations that concentrate most of their energy in low-frequency spectral components, especially along the layer dimension. Empirical analysis confirms this effect, showing that most of adaptation energy lies in low frequencies.

Building on this insight, we propose CrossSpectra, which parameterizes all attention-weight adaptations

(Q, K, V)

across layers as a single 3D tensor and represents them with sparse spectral coefficients (

κ_{1}, κ_{2}

). Using

κ_{1}

non-zero coefficients within each layer’s frequency space and truncating to

κ_{2}

frequencies across layers, CrossSpectra requires

O (κ_{1} κ_{2})

parameters instead of LoRA’s

O (L r d)

, where

L

is the number of layers and

r

the rank.

Across natural-language and vision benchmarks, methodname matches or surpasses baseline performance while using fewer parameters than LoRA, achieving only

0.36%

of LoRA’s parameter count when fine-tuning LLaMA-7B on instruction-following tasks. These results show that exploiting the architectural smoothness of transformers through spectral analysis yields major efficiency gains in PEFT.