Researcher, GSK plc
1 paper at NeurIPS 2025
This paper introduces a novel method for generating benchmarks to evaluate semantic similarity methods for LLM outputs, achieving cross-domain scalability and not replying on human judgment