Poster Session 1 · Wednesday, December 3, 2025 11:00 AM → 2:00 PM
#3309
Nearly-Linear Time and Massively Parallel Algorithms for -anonymity
Abstract
-anonymity is a widely-used privacy-preserving concept that ensures each record in a dataset is indistinguishable from at least other records. In this paper, we revisit -anonymity by suppression and give an -approximation algorithm with a nearly-linear runtime of for an arbitrary constant , where is the number of records and is the number of attributes. Previous algorithms with provable guarantees either:
- achieve the same approximation ratio but require at least runtime, or
- provide a better approximation ratio at the cost of an impractical worst-case runtime for general and .
Our algorithm extends to the Massively Parallel Computation (MPC) model, where it can be adapted into an MPC algorithm requiring rounds and total space . Empirically, we also demonstrate that our algorithmic ideas can be adapted to existing heuristic methods, leading to significant speed-ups while preserving comparable performance.
Although PS07 introduced improvements to achieve more practical runtimes for their -approximation algorithm, its worst-case runtime remains . A natural question arises: can we develop an algorithm with an approximation ratio and a polynomial runtime?
We investigate the single-point -anonymity problem, where the goal is to select additional records to make a given record indistinguishable. Surprisingly, assuming the dense vs random conjecture in complexity theory, we show that for , no algorithm can achieve a approximation in time. This provides evidence of the inherent hardness of the -anonymity problem.