Instruction Tuning Large Language Models to Understand Electronic Health Records

Zhenbang Wu, Anant Dadu, Michael Nalls, Faraz Faghri, Jimeng Sun

Abstract

Large language models (LLMs) have demonstrated remarkable capabilities in solving diverse tasks following human instructions. However, it is challenging to develop a conversational AI assistant for electronic medical health (EHR) data because (1) there is no large-scale instruction-following dataset and (2) existing model architectures are ineffective for handling complex and heterogeneous EHR data.Our paper introduces MIMIC-Instr, a dataset comprising over 400K open-ended instruction-following data based on the MIMIC-IV EHR database. This dataset covers a broad range of topics and can be used to instruction-tune general-purpose LLMs for diverse clinical use cases. Additionally, we propose Llemr, a general framework designed to empower LLMs to process and interpret EHRs with complex data schemas effectively. Llemr exhibits competitive capabilities in answering diverse patient-related based on EHR data.Furthermore, our evaluations on clinical predictive modeling benchmarks show that the fine-tuned Llemr can match the performance of state-of-the-art (SOTA) baselines with curated features.