2 papers across 2 sessions
ClinBench is an open-source, multi-model, multi-domain framework for rigorously benchmarking large language models on clinical information-extraction tasks.
We propose a cognitive-science-inspired framework and benchmark to systematically evaluate learning abilities of large language models.