PhD student, Texas A&M University - College Station
1 paper at NeurIPS 2025
We proposed a framework for reinforcing large reasoning models with discriminative constrained optimization , grounded in the principle that increasing the scores of positive answers while decreasing those of negative ones.