automation

2 papers across 1 session

Poster Session 4

Thursday, December 4, 2025 · 4:30 PM → 7:30 PM

Semantic-KG: Using Knowledge Graphs to Construct Benchmarks for Measuring Semantic Similarity

#1305 · Qiyao Wei, Edward R Morrell, Lea Goetz, Mihaela van der Schaar

This paper introduces a novel method for generating benchmarks to evaluate semantic similarity methods for LLM outputs, achieving cross-domain scalability and not replying on human judgment

Factorio Learning Environment

#312 · Jack Hopkins, Mart Bakler, Akbir Khan

Factorio Learning Environment is an evaluation for frontier models that offers exponentially scaling challenges.