Large Multimodal Models; Shuffle Learning; Directed Tokens

1 paper across 1 session

Poster Session 6

Friday, December 5, 2025 · 4:30 PM → 7:30 PM

Directed-Tokens: A Robust Multi-Modality Alignment Approach to Large Language-Vision Models

#4611 · Thanh-Dat Truong, Huu-Thien Tran, Tran Son, Bhiksha Raj, Khoa Luu

This paper introduces a new simple but efficient learning mechanism for improving the robust alignment between visual and textual modalities by solving shuffling problems.