1 paper across 1 session
We decompose the gap between selective classifiers and the ideal oracle into five measurable sources, showing that only non-monotone scoring methods can reduce it and improve reliability.