2 papers across 2 sessions
An end-to-end learnable tokenizer for Vision Transformers that enhances spatial and semantic learning by allowing retrofitting of pretrained models to use pixel-level tokens
We propose an offline goal-conditioned RL algorithm that achieves state-of-the-art performance on complex, long-horizon tasks without needing hierarchical policies or generative subgoal models.