Machine learning to predict notes for chart review in the oncology setting: a proof of concept strategy for improving clinician note-writing

Abstract

Objective: Leverage electronic health record (EHR) audit logs to develop a machine learning (ML) model that predicts which notes a clinician wants to review when seeing oncology patients. Materials and Methods We trained logistic regression models using note metadata and a Term Frequency Inverse Document Frequency (TF-IDF) text representation. We evaluated performance with precision, recall, F1, AUC, and a clinical qualitative assessment.

Results: The metadata only model achieved an AUC 0.930 and the metadata and TF-IDF model an AUC 0.937. Qualitative assessment revealed a need for better text representation and to further customize predictions for the user.

Discussion: Our model effectively surfaces the top 10 notes a clinician wants to review when seeing an oncology patient. Further studies can characterize different types of clinician users and better tailor the task for different care settings.

Conclusion: EHR audit logs can provide important relevance data for training ML models that assist with note-writing in the oncology setting.

Publication
Journal of the American Medical Informatics Association (JAMIA)
Sharon Jiang
Sharon Jiang
Master’s student

Harvard medical student

Monica Agrawal
Monica Agrawal
PhD Student

Assistant Professor, Duke

Shannon Shen
Shannon Shen
PhD Student

My research lies at the intersection between NLP and HCI. I am interested in understanding languages in scientific, legal, or clinical text from documents that are authored and used by domain experts. With newly developed NLP approaches, I study how they can enable better Human-AI collaboration to assist experts in these high-stake settings.

David Sontag
David Sontag
Professor of EECS

My research focuses on advancing machine learning and artificial intelligence, and using these to transform health care.

Related