Unextractable Protocol Models: Collaborative Training and Inference without Weight Materialization

Alexander Long, Chamin P Hewa Koneputugodage, Thalaiyasingam Ajanthan, Yan Zuo, Gil Avraham, Violetta Shevchenko, Hadi Mohaghegh Dolatabadi, Sameera Ramasinghe

Pluralis Research

Decentralized training LLMs Distributed training Open source Weight secrecy

⋅ NeurIPS ⋅ Poster ⋅OpenReview

Abstract

We consider a decentralized setup in which the participants collaboratively train and serve a large neural network, and where each participant only processes a subset of the model. In this setup, we explore the possibility of unmaterializable weights, where a full weight set is never available to any one participant.

We introduce Unextractable Protocol Models (UPMs): a training and inference framework that leverages the sharded model setup to ensure model shards (i.e.,, subsets) held by participants are incompatible at different time steps. UPMs periodically inject time-varying, random, invertible transforms at participant boundaries; preserving the overall network function yet rendering cross-time assemblies incoherent.

On Qwen-2.5-0.5B and Llama-3.2-1B, 10 000 transforms leave FP32 perplexity unchanged (

Δ

PPL

< 0.01

; Jensen–Shannon drift

< 4 \times 1 0^{- 5}

), and we show how to control growth for lower precision datatypes. Applying a transform every 30s adds 3% latency, 0.1% bandwidth, and 10% GPU-memory overhead at inference, while training overhead falls to 1.6% time and < 1% memory.

We consider several attacks, showing that the requirements of direct attacks are impractical and easy to defend against, and that gradient-based fine-tuning of stitched partitions consumes

\geq 60%

of the tokens required to train from scratch. By enabling models to be collaboratively trained yet not extracted, UPMs make it practical to embed programmatic incentive mechanisms in community-driven decentralized training.