3 papers across 3 sessions
We make Chain-of-Thought reasoning in large language models (1) more efficient by creating implicit reasoning with lightweight language models; (2) still effective as the implicit reasoning maintains semantic alignment with ground-truth reasoning.
Theoretical analysis of scheduling algorithms for LLM queries with latency constraints when using RadixAttention along with a novel scheduling algorithm.
We propose two scalable DP algorithms for high-dimensional sparse variable selection, leveraging modern mixed-integer programming techniques.