1 paper across 1 session
An architectural optimization method that accelerate inference efficiency by replacing sequential computation in LLM with parallel computation while preserving accuracy.