1 paper across 1 session
We find bigram subnetworks in Transformer language models that are critical to model performance.