5 papers across 3 sessions
We demonstrate that the PFN-framework allows for the accurate estimation of causal effects under weakened assumptions.
This paper presents a holistic and approximate normalization approach that accelerates GPT training by up to 40% while eliminating the need for weight decay and learning rate warm-up.
We extend DeltaNet by using products of householders as state-transition matrices allowing us to trade-off expressivity and computational complexity.
We introduce TabArena, the first continuously maintained living benchmarking system for machine learning on tabular data.
We introduce the Gompertz Linear Unit (GoLU), a novel self-gated activation function with superior performance on a diverse range of tasks.