5 papers across 3 sessions
We propose a method for modeling dynamical systems, that bridges efficient latent space modeling with entity tracking by introducing identifier representations that maintain entity traceability within a latent system representation.
We introduce TiledFlashLinearAttention a faster kernel algorithm for Linear RNNs and mLSTMs by improved Sequence Parallelism.
We propose GyroSwin, a 5D Swin Transformer to learn a 5D PDE commonly encountered in Gyrokinetics to simulate turbulence in a nuclear fusion reactor.
We propose EVA, a parameter-efficient fine-tuning method that initalizes LoRA weights in a variance-optimal manner and performs adaptive rank allocation to provably maximize the expected gradient signal at the beginning of fine-tuning.