6 papers across 3 sessions
DIET makes LLMs more token-efficient by using problem difficulty to dynamically guide compression during reinforcement learning, boosting reasoning performance and enabling superior inference scaling under fixed budgets.
We propose a new distillation approach that removes the input question for adaptive and efficient reasoning.
We train a token-level neural router to let SLM following LLM reasoning paths by replacing only divergent tokens