2 papers across 2 sessions
We propose a text-aligned visual representation to unify both visual understanding and generation within a single MLLM.