2 papers across 2 sessions
ResponseRank enables data-efficient learning of distance-aware reward models through stratified comparison strength rankings.
We offer statistically robust methods for preference learning that leverage response time in the estimation of rewards to yield large improvements in statistical efficiency.