5 papers across 3 sessions
We propose a framework for efficient MoE post-training on 3.5D Wafer-scale chiplets.
We introduce a novel inference-time steering methodology called Reinforcing Cognitive Experts (RICE), designed to improve reasoning depth and efficiency without additional training or complex heuristics.
A novel MoE architecture that extends mixture-of-experts to both attention and feed-forward layers with unified expert designs and attention-FFN parameter sharing.