7 papers across 3 sessions
Novel systematic cognitive reasoning evaluation of VLMs, providing clues for the sources of reasoning bottlenecks, as well as simple and effective solutions.
RL to train LLMs how to generate data and update themselves to adapt to new knowledge/tasks.
We show that limiting a model's confidence during training can improve test-time scaling in mathematical reasoning.