Balancing Positive and Negative Classification Error Rates in Positive-Unlabeled Learning

Ximing Li, Yuanchao Dai, Bing Wang, Changchun Li, Jianfeng Qu, Renchu Guan

Jilin University· RIKEN· Soochow University· Suzhou City University

positive and unlabeled learning generalization error bound

Abstract

Positive and Unlabeled (PU) learning is a special case of binary classification with weak supervision, where only positive labeled and unlabeled data are available. Previous studies suggest several specific risk estimators of PU learning such as non-negative PU (nnPU), which are unbiased and consistent with the expected risk of supervised binary classification.

In nnPU, the negative-class empirical risk is estimated by positive labeled and unlabeled data with a non-negativity constraint. However, its negative-class empirical risk estimator approaches 0, so the negative class is over-played, resulting in imbalanced error rates between positive and negative classes. To solve this problem, we suppose that the expected risks of the positive-class and negative-class should be close. Accordingly, we constrain that the negative-class empirical risk estimator is lower bounded by the positive-class empirical risk, instead of 0; and also incorporate an explicit equality constraint between them.

we suggest a risk estimator of PU learning that balances positive and negative classification error rates, named

D C - PU

, and suggest an efficient training method for

D C - PU

based on the augmented Lagrange multiplier framework. We theoretically analyze the estimation error of

D C - PU

and empirically validate that

D C - PU

achieves higher accuracy and converges more stable than other risk estimators of PU learning. Additionally,

D C - PU

also performs competitive accuracy performance with practical PU learning methods.