Large-Scale Study of Temporal Shift in Health Insurance Claims

Abstract

Most machine learning models for predicting clinical outcomes are developed using historical data. Yet, even if these models are deployed in the near future, dataset shift over time may result in less than ideal performance. To capture this phenomenon, we consider a task–that is, an outcome to be predicted at a particular time point–to be non-stationary if a historical model is no longer optimal for predicting that outcome. We build an algorithm to test for temporal shift either at the population level or within a discovered sub-population. Then, we construct a meta-algorithm to perform a retrospective scan for temporal shift on a large collection of tasks. Our algorithms enable us to perform the first comprehensive evaluation of temporal shift in healthcare to our knowledge. We create 1,010 tasks by evaluating 242 healthcare outcomes for temporal shift from 2015 to 2020 on a health insurance claims dataset. 9.7% of the tasks show temporal shifts at the population level, and 93.0% have some sub-population affected by shifts. We dive into case studies to understand the clinical implications. Our analysis highlights the widespread prevalence of temporal shifts in healthcare.

Publication
Conference on Health, Inference, and Learning
Christina X Ji
Christina X Ji
PhD Student

Christina is interested in applying machine learning to healthcare, detecting distribution shift and developing transfer learning algorithms, and evaluating treatments and reinforcement learning policies with causal inference.

David Sontag
David Sontag
Professor of EECS

My research focuses on advancing machine learning and artificial intelligence, and using these to transform health care.

Related