4 papers across 3 sessions
We introduce a theoretically-grounded distributional RL algorithm for LLM post-training that demonstrates improvement upon prior work on both synthetic and mathematical reasoning tasks.
FreshStack is a framework to build realistic IR & RAG evaluation benchmarks on niche and recent domains from community-asked questions and answers.