Background: Despite the tremendous progress recently made towards automatic sleep staging in adults, it is currently unknown if the most advanced algorithms generalize to the pediatric population, which displays distinctive characteristics in overnight polysomnography (PSG). Methods: To answer the question, in this work, we conduct a large-scale comparative study on the state-of-the-art deep learning methods for pediatric automatic sleep staging. Six different deep neural networks with diverging features are adopted to evaluate a sample of more than 1,200 children across a wide spectrum of obstructive sleep apnea (OSA) severity. Results: Our experimental results show that the individual performance of automated pediatric sleep stagers when evaluated on new subjects is equivalent to the expert-level one reported on adults. Combining the six stagers into ensemble models further boosts the staging accuracy, reaching an overall accuracy of 88.8%, a Cohen's kappa of 0.852, and a macro F1-score of 85.8%. At the same time, the ensemble models lead to reduced predictive uncertainty. The results also show that the studied algorithms and their ensembles are robust to concept drift when the training and test data were recorded seven months apart and after clinical intervention. Conclusion: However, we show that the improvements in the staging performance are not necessarily clinically significant although the ensemble models lead to more favorable clinical measures than the six standalone models. Significance: Detailed analyses further demonstrate "almost perfect" agreement between the automatic stagers to one another and their similar patterns on the staging errors, suggesting little room for improvement.
翻译:尽管最近在成人自动入睡方面取得了巨大进展,但目前尚不清楚的是,最先进的算法是否概括了小儿科人口,这些算法在夜间多元合成学(PSG)中显示了独特的特征。 方法:为了回答问题,我们在本工作中对儿科自动入睡最先进的深学习方法进行了大规模比较研究。 采用了6个具有不同特点的深层神经网络来评估超过1 200名儿童的抽样,范围很广的阻碍性睡眠腹部(OSA)严重性能。 结果:我们的实验结果显示,在对新课题进行评估时,自动儿科入睡阶段的个体性能相当于专家级的成人。为了回答这个问题,我们将6个舞台合并到共同模型中,进一步提升了学前的准确性能,达到了88.8%的全局性能,Cohen的另外一个直径直径为0.852,而F1分位为85.8%。 同时,在精确的精确度上, 模型导致预测性能下降。 在对新课题进行评估时,自动入床阶段的睡眠阶段的个体睡眠阶段, 也表明, 类似入行式的实验阶段的实验阶段的进度分析结果显示:我们所研究的分数为不同的阶段。