KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation
#2014 Spotlight · Jiajun Shi, Jian Yang, Jiaheng Liu, Xingyuan Bu, Jiangjie Chen, Junting Zhou, Kaijing Ma, Zhoufutu Wen, Bingli Wang, Yancheng He, Liang Song, Hualei Zhu, Shilong Li, Xingjian Wang, Wei Zhang, Ruibin Yuan, Yifan Yao, Wenjun Yang, Yunli Wang, Siyuan Fang, Siyu Yuan, Qianyu He, Robert Tang, Yingshui Tan, Wangchunshu Zhou, ZHAO-XIANG ZHANG, Zhoujun Li, Wenhao Huang, Ge Zhang
We propose KORGym, a dynamic, game‐based benchmark offering over 50+ interactive tasks with RL support for multi‐turn LLM reasoning evaluation, and validate its effectiveness through extensive experiments, revealing several key insights.