Poster Session 4 · Thursday, December 4, 2025 4:30 PM → 7:30 PM
#312
Factorio Learning Environment
Abstract
Large Language Models (LLMs) are rapidly saturating existing benchmarks, necessitating new open-ended evaluations. We introduce the Factorio Learning Environment (FLE), based on the game of Factorio, that tests agents in long-term planning, spatial reasoning, program synthesis, and resource optimization. FLE provides exponentially scaling challenges -- from basic automation to complex factories processing millions of resource units per second.
We provide two settings:
- open-play with the open-ended task of building the largest factory on an procedurally generated map
- lab-play consisting of 33 bounded tasks accross three settings with fixed resources.
We demonstrate across both settings that models still lack strong spatial reasoning. In lab-play, we find that LLMs exhibit promising short-horizon skills, yet are unable to operate effectively in constrained environments, reflecting limitations in error analysis. In open-play, while LLMs discover automation strategies that improve growth (e.g., electric-powered drilling), they fail to achieve complex automation (e.g., electronic-circuit manufacturing).