Full Professor, LMU Munich
1 paper at NeurIPS 2025
Standard Glorot initialization becomes unstable when used in RNNs with long sequences, leading to exploding hidden states. To address this, we propose a simple rescaling that effectively mitigates the instability.