On Traceability in $ℓ_{p}$ Stochastic Convex Optimization

Sasha Voitovych, Mahdi Haghifam, Idan Attias, Gintare Karolina Dziugaite, Roi Livni, Daniel M. Roy

generalization theory differential privacy learning theory

Abstract

In this paper, we investigate the necessity of traceability for accurate learning in stochastic convex optimization (SCO) under

ℓ_{p}

geometries. Informally, we say a learning algorithm is $m$ -traceable if, by analyzing its output, it is possible to identify at least

m

of its training samples.

Our main results uncover a fundamental tradeoff between traceability and excess risk in SCO. For every

p \in 1, \infty)

, we establish the existence of an excess risk threshold below which every sample-efficient learner is traceable with the number of samples which is a constant fraction of its training sample. For

p \in

1,2, this threshold coincides with the best excess risk of differentially private (DP) algorithms, i.e., above this threshold, there exist algorithms that are not traceable, which corresponds to a sharp phase transition. For

p \in (2, \infty)

, this threshold instead gives novel lower bounds for DP learning, partially closing an open problem in this setup.

En route to establishing these results, we prove a sparse variant of the fingerprinting lemma, which is of independent interest to the community.