Poster Session 3 · Thursday, December 4, 2025 11:00 AM → 2:00 PM
#2409 Spotlight
MAESTRO : Adaptive Sparse Attention and Robust Learning for Multimodal Dynamic Time Series
Abstract
From clinical healthcare to daily living, continuous sensor monitoring across multiple modalities has shown great promise for real-world intelligent decision-making but also faces various challenges. In this work, we argue for modeling such heterogeneous data sources under the multimodal paradigm and introduce a new framework, MAESTRO.
We introduce MAESTRO, a novel framework that overcomes key limitations of existing multimodal learning approaches:
- reliance on a single primary modality for alignment,
- pairwise modeling of modalities, and
- assumption of complete modality observations.
At its core, MAESTRO facilitates dynamic intra- and cross-modal interactions based on task relevance, and leverages symbolic tokenization and adaptive attention budgeting to construct long multimodal sequences, which are processed via sparse cross-modal attention. The resulting cross-modal tokens are routed through a sparse Mixture-of-Experts (MoE) mechanism, enabling black-box specialization under varying modality combinations.
We evaluate MAESTRO against 10 baselines on four diverse datasets spanning three applications, and observe average relative improvements of 4% and 8% over the best existing multimodal and multivariate approaches, respectively, under complete observations. Under partial observations—with up to 40% of missing modalities—MAESTRO achieves an average 9% improvement. Further analysis also demonstrates the robustness and efficiency of MAESTRO's sparse, modality-aware design for learning from dynamic time series.