2 papers across 2 sessions
Our empirically and theoretically informed method, which treats diversity as a reward, achieves new SOTA average performance across 7 benchmarks on SOTA LLMs with domain-undetermined data.
We propose MindGYM, a thinking-centric data synthesis framework that injects cognitive traits into QA generation, enabling language and vision-language models to self-synthesize high-quality, low-variance data for efficient fine-tuning.