PhD student, University of Texas at Austin
1 paper at NeurIPS 2025
We propose SAS to simulate larger attention head numbe and hidden size per head for better performance, keeping the original model size.