Ravan: Multi-Head Low-Rank Adaptation for Federated Fine-Tuning

Arian Raje, Baris Askin, Divyansh Jhunjhunwala, Gauri Joshi

Federated Learning LoRA Fine-Tuning Efficiency

Abstract

Large Language Models (LLMs) have yet to effectively leverage the vast amounts of edge-device data, and Federated Learning (FL) offers a promising paradigm to collaboratively fine-tune LLMs without transferring private edge data to the cloud. To operate within the computational and communication constraints of edge devices, recent literature on federated fine-tuning of LLMs proposes the use of low-rank adaptation (LoRA) and similar parameter-efficient methods. However, LoRA-based methods suffer from accuracy degradation in FL settings, primarily because of data and computational heterogeneity across clients.

We propose Ravan, an adaptive multi-head LoRA method that balances parameter efficiency and model expressivity by reparameterizing the weight updates as the sum of multiple LoRA heads,

s_{i} B_{i} H_{i} A_{i}

, in which only the

H_{i}

parameters and their lightweight scaling factors

s_{i}

are trained. These trainable scaling factors let the optimization focus on the most useful heads, recovering a higher-rank approximation of the full update without increasing the number of communicated parameters since clients upload

s_{i} H_{i}

directly.

Experiments on vision and language benchmarks show that Ravan improves test accuracy by 2–8\% over prior parameter-efficient baselines, making it a robust and scalable solution for federated fine-tuning of LLMs.