2 papers across 2 sessions
An end-to-end learnable tokenizer for Vision Transformers that enhances spatial and semantic learning by allowing retrofitting of pretrained models to use pixel-level tokens
Common prior knowledge can distort edge weights in DAG structure learning, creating characteristic graph patterns. Our framework detects these patterns as signals to adaptively adjust learning process, improving prior integration and DAG accuracy.