1 paper across 1 session
The way we rasterize images to 1D sequences to feed into long sequence models is sub-optimal! We show that orders other than row major can be better, and provide an RL method to learn the optimal ordering.