4 papers across 2 sessions
Our proposed orthogonality and variance losses improve performance in downstream fine-tuning of Mixture-of-Experts models by enhancing expert specificity, addressing expert homogenization caused by load balancing, while maintaining load balance.
Efficient Long Context Fine-tuning through Dynamic Data Scheduling
A new learning framework that improves LLM inference by learning from a Mistake Log collected during fine-tuning.
Alchemist: a compact (3.3k) SFT dataset via diffusion-model filtering. Boosts T2I aesthetics/complexity in 5 SD models (weights released) while keeping diversity.