Poster Session 2 · Wednesday, December 3, 2025 4:30 PM → 7:30 PM
#1207
InvisibleInk: High-Utility and Low-Cost Text Generation with Differential Privacy
Abstract
As major progress in LLM-based long-form text generation enables paradigms such as retrieval-augmented generation (RAG) and inference-time scaling, safely incorporating private information into the generation remains a critical open question.
We present InvisibleInk, a highly scalable long-form text generation framework satisfying rigorous differential privacy guarantees with respect to the sensitive reference texts. It interprets sampling from the LLM's next-token-distribution as the exponential mechanism over the LLM logits with two innovations.
- First, we reduce the privacy cost by isolating and clipping only the sensitive information in the model logits (relative to the public logits).
- Second, we improve text quality by sampling without any privacy cost from a small superset of the top- private tokens.
Empirical evaluations demonstrate a consistent (or more) reduction in computation cost over state-of-the-art baselines to generate long-form private text of the same utility across privacy levels. InvisibleInk is able to generate, for the first time, high-quality private long-form text at less than times the computation cost of non-private generation, paving the way for its practical use.