Deep Learning (DL) methods have dramatically increased in popularity in recent years, with significant growth in their application to supervised learning problems in the biomedical sciences. However, the greater prevalence and complexity of missing data in modern biomedical datasets present significant challenges for DL methods. Here, we provide a formal treatment of missing data in the context of deeply learned generalized linear models, a supervised DL architecture for regression and classification problems. We propose a new architecture, \textit{dlglm}, that is one of the first to be able to flexibly account for both ignorable and non-ignorable patterns of missingness in input features and response at training time. We demonstrate through statistical simulation that our method outperforms existing approaches for supervised learning tasks in the presence of missing not at random (MNAR) missingness. We conclude with a case study of a Bank Marketing dataset from the UCI Machine Learning Repository, in which we predict whether clients subscribed to a product based on phone survey data.
翻译:近年来,深入学习(DL)方法的普及程度急剧增加,在生物医学科学的监督下学习问题方面应用的方法显著增加。然而,现代生物医学数据集中缺失的数据更加普遍和复杂,对DL方法提出了重大挑战。在这里,我们提供了一种在深入学习的通用线性模型背景下对缺失数据的正式处理方法,一种受监督的回归和分类问题的DL结构。我们提出了一个新的架构,\textit{dlglm},这是能够灵活地说明输入特征和培训时间反应中可忽略和不可忽略的缺失模式的第一批结构之一。我们通过统计模拟表明,我们的方法在不随机失踪的情况下,超过了监督学习任务的现有方法。我们最后对UCI机器学习(MNAR)数据库的银行营销数据集进行了案例研究,我们在该数据库中预测客户是否赞同基于电话调查数据的产品。