2 papers across 2 sessions
We propose a benchmark to evaluate LLMs' spatiotemporal reasoning capabilities
We track objects through transformations while detecting and describing these state changes.