Over the last few years, research in automatic sleep scoring has mainly focused on developing increasingly complex deep learning architectures. However, recently these approaches achieved only marginal improvements, often at the expense of requiring more data and more expensive training procedures. Despite all these efforts and their satisfactory performance, automatic sleep staging solutions are not widely adopted in a clinical context yet. We argue that most deep learning solutions for sleep scoring are limited in their real-world applicability as they are hard to train, deploy, and reproduce. Moreover, these solutions lack interpretability and transparency, which are often key to increase adoption rates. In this work, we revisit the problem of sleep stage classification using classical machine learning. Results show that competitive performance can be achieved with a conventional machine learning pipeline consisting of preprocessing, feature extraction, and a simple machine learning model. In particular, we analyze the performance of a linear model and a non-linear (gradient boosting) model. Our approach surpasses state-of-the-art (that uses the same data) on two public datasets: Sleep-EDF SC-20 (MF1 0.810) and Sleep-EDF ST (MF1 0.795), while achieving competitive results on Sleep-EDF SC-78 (MF1 0.775) and MASS SS3 (MF1 0.817). We show that, for the sleep stage scoring task, the expressiveness of an engineered feature vector is on par with the internally learned representations of deep learning models. This observation opens the door to clinical adoption, as a representative feature vector allows to leverage both the interpretability and successful track record of traditional machine learning models.
翻译:在过去几年里,自动睡眠评分的研究主要侧重于发展日益复杂的深层学习结构,然而,最近这些方法只取得了一些微小的改进,往往以牺牲更多的数据和更昂贵的培训程序为代价。尽管作出了所有这些努力和取得了令人满意的业绩,但是在临床方面尚未广泛采用自动睡眠准备解决方案。我们认为,大多数睡眠评分的深层学习解决方案在现实世界适用性方面是有限的,因为它们难以培训、部署和复制。此外,这些解决方案缺乏解释性和透明度,而这是提高传统采纳率的关键。在这项工作中,我们利用古典机器观察来重新审视睡眠阶段分类的问题。结果显示,通过传统机器学习管道,包括预处理、特征提取和简单的机器学习模式,可以实现成功的竞争绩效。我们尤其分析了线性模型和非线性(高度提升)模型的性能。我们的方法在两种公共数据集上超过了最新水平(即使用相同的数据):睡眠-EDF SSC-20(MF1 0.810) 和睡眠-EDFST(M1门级标准) 和睡眠阶段ST(MF1 0.795,同时在S-EM1-EMFMS-DF IMex Relemental relegilde real State Relegismalation)中,我们展示了S-S-S-78MIS-S-S-S-SIS-S-S-S-S-S-S-S-S-S-S-S-SBlevalismalismalismalismalismalismex)。