8 papers across 3 sessions
We introduce CompleteP, which offers depth-wise HP transfer, FLOP savings when training deep models, and a larger range of compute-efficient width/depth ratios.
We extend DeltaNet by using products of householders as state-transition matrices allowing us to trade-off expressivity and computational complexity.
The paper derives generalization bounds for selective SSMs using connections to self-attention, showing that spectral properties of the state matrix influence generalization.
We propose a novel algorithms, called NGN-M with a strong theoretical convergence analysis and extensive numerical evaluations showing the robustness of our algorithm to the choice of the learning rate hyperparameter.
A dataset containing neural activity and finger kinematics from 303 sessions of a monkey performing a 2-DOF finger movement task, recorded over a 1242 day (~3.5 year) timespan.