1 paper across 1 session
Standard Glorot initialization becomes unstable when used in RNNs with long sequences, leading to exploding hidden states. To address this, we propose a simple rescaling that effectively mitigates the instability.