1 paper across 1 session
We present OmniTalker, the first end-to-end framework for joint text-driven speech and talking head generation. It achieves 25 FPS synthesis while preserving speaker identity and synchronizing audiovisual outputs in one-shot settings.