Assistant Professor, University of Connecticut
1 paper at NeurIPS 2025
We have built a highly modular, multimodal general-purpose agent that can interact with a computer via text, images, audio, and video.