Over the last few years, research in automatic sleep scoring has mainly focused on developing increasingly complex deep learning architectures. However, recently these approaches achieved only marginal improvements, often at the expense of requiring more data and more expensive training procedures. Despite all these efforts and their satisfactory performance, automatic sleep staging solutions are not widely adopted in a clinical context yet. We argue that most deep learning solutions for sleep scoring are limited in their real-world applicability as they are hard to train, deploy, and reproduce. Moreover, these solutions lack interpretability and transparency, which are often key to increase adoption rates. In this work, we revisit the problem of sleep stage classification using classical machine learning. Results show that state-of-the-art performance can be achieved with a conventional machine learning pipeline consisting of preprocessing, feature extraction, and a simple machine learning model. In particular, we analyze the performance of a linear model and a non-linear (gradient boosting) model. Our approach surpasses state-of-the-art (that uses the same data) on two public datasets: Sleep-EDF SC-20 (MF1 0.810) and Sleep-EDF ST (MF1 0.795), while achieving competitive results on Sleep-EDF SC-78 (MF1 0.775) and MASS SS3 (MF1 0.817). We show that, for the sleep stage scoring task, the expressiveness of an engineered feature vector is on par with the internally learned representations of deep learning models. This observation opens the door to clinical adoption, as a representative feature vector allows to leverage both the interpretability and successful track record of traditional machine learning models.
翻译:在过去几年里,自动睡眠评分的研究主要侧重于发展日益复杂的深层学习结构,然而,最近这些方法只取得了一些微小的改进,往往以牺牲更多的数据和更昂贵的培训程序为代价。尽管作出了所有这些努力和取得了令人满意的表现,但是在临床环境中尚未广泛采用自动睡眠准备解决方案。我们争辩说,大多数睡眠评分的深层次学习解决方案在现实世界的适用性方面都很有限,因为它们很难培训、部署和复制。此外,这些解决方案缺乏解释性和透明度,而这对于提高采纳率往往至关重要。在这项工作中,我们利用古典机器学习来重新审视睡眠阶段的分类问题。结果显示,通过传统的机器学习管道,包括预处理、特征提取和简单的机器学习模式,可以实现最先进的表现。我们分析的线性模型和非线性(高度提升)模型的性能有限,因为它们很难培训、部署和复制。我们的方法在两个公共数据集中超过了现状(使用相同的数据):睡眠-EDFSC-20(MF1 0.810)和睡眠-EDFST(M1MMM-MS-S-S-SBlass relearal Relection) 和Slimal redustreval redustrisal Stal Redustrismal redustrismal sal 和我们S-S-SS-S-S-S-SBleval-S-S-SBS-S-S-S-S-S-SBSBSBS-S-S-S-S-SBS-S-SBreal-S-SBresmal-SBresmal-SBAR-SBR)的SBSBSBSBSBSBS-S-S-S-S-S-S-S-SBS-S-S-S-S-S-S-S-S-S-S-S-S-S-S-SBLSBAR-SDFSDFMFS-S-S-S-S-S-SBS-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S