MS student, Computer Science Department, Technion-Israel Institute of Technology
1 paper at NeurIPS 2025
An architectural optimization method that accelerate inference efficiency by replacing sequential computation in LLM with parallel computation while preserving accuracy.