Performance and utility trade-off in interpretable sleep staging

Abstract

Recent advances in deep learning have led to the development of models approaching the human level of accuracy. However, healthcare remains an area lacking in widespread adoption. The safety-critical nature of healthcare results in a natural reticence to put these black-box deep learning models into practice. This paper explores interpretable methods for a clinical decision support system called sleep staging, an essential step in diagnosing sleep disorders. Clinical sleep staging is an arduous process requiring manual annotation for each 30s of sleep using physiological signals such as electroencephalogram (EEG). Recent work has shown that sleep staging using simple models and an exhaustive set of features can perform nearly as well as deep learning approaches but only for some specific datasets. Moreover, the utility of those features from a clinical standpoint is ambiguous. On the other hand, the proposed framework, NormIntSleep demonstrates exceptional performance across different datasets by representing deep learning embeddings using normalized features. NormIntSleep performs 4.5% better than the exhaustive feature-based approach and 1.5% better than other representation learning approaches. An empirical comparison between the utility of the interpretations of these models highlights the improved alignment with clinical expectations when performance is traded-off slightly. NormIntSleep paired with a clinically meaningful set of features can best balance this trade-off by providing reliable, clinically relevant interpretation with robust performance.

Publication
In NeurIPS 2022 Workshop on Learning from Time Series for Health
Irfan Al-Hussaini
Irfan Al-Hussaini
Senior Data Scientist

My research interests include interpretable machine learning and natural language processing.