2 papers across 2 sessions
We introduce MoCha, the first model for dialogue-driven movie shot generation.
We present OmniTalker, the first end-to-end framework for joint text-driven speech and talking head generation. It achieves 25 FPS synthesis while preserving speaker identity and synchronizing audiovisual outputs in one-shot settings.