search - NeurIPS 2025

6 papers across 3 sessions

Poster Session 3

Thursday, December 4, 2025 · 11:00 AM → 2:00 PM

Feedback-Aware MCTS for Goal-Oriented Information Seeking

#2009 Spotlight · Harshita Chopra, Chirag Shah

A framework that uses Monte Carlo Tree Search for guiding an LLM to ask information-seeking questions, learning from past successful strategies to solve problems more efficiently and accurately.

Classical Planning with LLM-Generated Heuristics: Challenging the State of the Art with Python Code

#3210 · Augusto B. Corrêa, André G. Pereira, Jendrik Seipp

We use LLMs to create state-of-the-art AI planners.

Poster Session 5

3 papers

Friday, December 5, 2025 · 11:00 AM → 2:00 PM

Exhibit Hall C,D,E

Test-Time Scaling of Diffusion Models via Noise Trajectory Search

#611 · Vignav Ramesh, Morteza Mardani

We present an algorithm for test-time scaling of SDE-based diffusion models by searching for noise trajectories which optimize arbitrary rewards, empirically matching/exceeding MCTS performance.

Language Models can Self-Improve at State-Value Estimation for Better Search

#1916 Spotlight · Ethan Mendes, Alan Ritter

a self-supervised method that improves open-weight value models using state-transition dynamics, enabling reward-free, efficient search with performance comparable to search with costly large models and tree-based methods

Are Language Models Efficient Reasoners? A Perspective from Logic Programming

#1810 · Andreas Opedal, Yanick Zengaffinen, Haruki Shirakami, Clemente Pasti, Mrinmaya Sachan, Abulhair Saparov, Ryan Cotterell, Bernhard Schölkopf

Poster Session 6

1 paper

Friday, December 5, 2025 · 4:30 PM → 7:30 PM

Exhibit Hall C,D,E

DISC: Dynamic Decomposition Improves LLM Inference Scaling

#511 · Jonathan Li, Wei Cheng, Benjamin Riviere, Yue Wu, Masafumi Oyamada, Mengdi Wang, Yisong Yue, Santiago Paternain, Haifeng Chen

We propose DISC, a dynamic decomposition method that adaptively adjusts step sizes during LLM inference to allocate compute more efficiently, significantly improving performance and sample efficiency across reasoning and code generation benchmarks.