1 paper across 1 session
We reliably predict the behavior of black-box language models by training predcitors on their responses to follow-up questions.