PhD student, Beijing University of Posts and Telecommunications
2 papers at NeurIPS 2025
This paper investigates how hallucinations arise and persist in RLLM reasoning, revealing error self-reinforcement and limited metacognition.
Our proposed orthogonality and variance losses improve performance in downstream fine-tuning of Mixture-of-Experts models by enhancing expert specificity, addressing expert homogenization caused by load balancing, while maintaining load balance.