1 paper across 1 session
Using a heterogeneous Mixture-of-Experts model architecture, we show that brain-like processing pathways form due to inductive biases on processing complexity and expert dropout