1 paper across 1 session
We propose PlanU, a method that enhances LLM-based decision-making under uncertainty by modeling value distributions via quantile regression and guiding MCTS exploration using a novel Upper Confidence Bounds with Curiosity (UCC) score.