PhD student, Weizmann Institute, Technion
1 paper at NeurIPS 2025
An architectural optimization method that accelerate inference efficiency by replacing sequential computation in LLM with parallel computation while preserving accuracy.