4 papers across 2 sessions
We propose a framework for efficient MoE post-training on 3.5D Wafer-scale chiplets.
A novel semi-supervised learning paradigm that unifies view-wise co-training, meta-learned supervision, and adversarial perturbation through a structured triadic game.
uncovering the Role of Long-Context Ability in Reasoning Training
We introduce a new method for selecting subspaces in low-rank optimization for memory-efficient pretraining of large language models (LLMs).