1 paper across 1 session
We propose a Human Motion-Vision-Language Model (HMVLM) based on MoE LoRA for diverse human-centric downstream tasks. A novel "zero expert" is introduced to mitigate catastrophic forgetting during the instruction-tuning.