1 paper across 1 session
We introduced Multi-agent KTO, a method that trains LLM to play Werewolf through direct gameplay. Our approach outperforms GPT-4o and RL+LLM methods, achieving human-competitive performance.