2 papers across 2 sessions
Standard Glorot initialization becomes unstable when used in RNNs with long sequences, leading to exploding hidden states. To address this, we propose a simple rescaling that effectively mitigates the instability.