1 paper across 1 session
We proposed a framework for reinforcing large reasoning models with discriminative constrained optimization , grounded in the principle that increasing the scores of positive answers while decreasing those of negative ones.