2 papers across 2 sessions
We decompose the gap between selective classifiers and the ideal oracle into five measurable sources, showing that only non-monotone scoring methods can reduce it and improve reliability.