Is the acquisition worth the cost? Surrogate losses for Consistent Two-stage Classifiers

Florence Regol, Joseph Cotnareanu, Theodore Glavas, Mark Coates

Block, Toronto, Canada· McGill University· Mila

consistency dynamic networks learning to defer

Abstract

Recent years have witnessed the emergence of a spectrum of foundation models, covering a broad range of capabilities and costs. Often, we effectively use foundation models as feature generators and train classifiers that use the outputs of these models to make decisions.

In this paper, we consider an increasingly relevant setting where we have two classifier stages. The first stage has access to features

x

and has the option to make a classification decision or defer, while incurring a cost, to a second classifier that has access to features

x

and

z

. This is similar to the "learning to defer" setting, with the important difference that we train both classifiers jointly, and the second classifier has access to more information.

The natural loss for this setting is an

ℓ_{01 c}

loss, where a penalty is paidfor incorrect classification, as in

ℓ_{01}

, but an additional penalty

c

is paidfor consulting the second classifier. The

ℓ_{01 c}

loss is unwieldyfor training. Our primary contribution in this paper is the derivationof a hinge-based surrogate loss

ℓ_{hin g e}^{c}

that is much moreamenable to training but also satisfies the property that

ℓ_{hin g e}^{c}

-consistency implies

ℓ_{01 c}

-consistency.