具有高度不完整数据集的神经网络培训 (Neural Network Training with Highly Incomplete Datasets)

Neural network training and validation rely on the availability of large high-quality datasets. However, in many cases only incomplete datasets are available, particularly in health care applications, where each patient typically undergoes different clinical procedures or can drop out of a study. Since the data to train the neural networks need to be complete, most studies discard the incomplete datapoints, which reduces the size of the training data, or impute the missing features, which can lead to artefacts. Alas, both approaches are inadequate when a large portion of the data is missing. Here, we introduce GapNet, an alternative deep-learning training approach that can use highly incomplete datasets. First, the dataset is split into subsets of samples containing all values for a certain cluster of features. Then, these subsets are used to train individual neural networks. Finally, this ensemble of neural networks is combined into a single neural network whose training is fine-tuned using all complete datapoints. Using two highly incomplete real-world medical datasets, we show that GapNet improves the identification of patients with underlying Alzheimer's disease pathology and of patients at risk of hospitalization due to Covid-19. By distilling the information available in incomplete datasets without having to reduce their size or to impute missing values, GapNet will permit to extract valuable information from a wide range of datasets, benefiting diverse fields from medicine to engineering.

翻译：神经网络的培训和验证取决于大量高质量数据集的可用性。然而,在许多情况下,只能提供不完整的数据集,特别是在保健应用中,每个病人通常都经过不同的临床程序,或者可以退出研究。由于神经网络培训的数据需要完成,大多数研究抛弃了不完整的数据点,因为数据点减少了培训数据的规模,或者低估了缺失的特征,从而可能导致人工制品。唉,当数据缺少大量数据时,这两种方法都不够充分。在这里,我们引入GapNet,这是一种可以使用高度不完整数据集的替代深层次学习培训方法。首先,数据集被分成包含某些特征组的所有值的样本子组。然后,这些子组被用于培训单个神经网络。最后,神经网络的这一组合被合并成一个单一的神经网络网络,用所有完整的数据点对培训进行精确调整。使用两个高度不完整的真实世界医学数据集,我们展示GapNet改进了对具有基本老年痴呆症病病病病病病症病理学基础的病人的医学识别方法,并且将病人的精度分成了一组样本组,从而降低了他们住院到无法提取的宝贵数据的范围。

相关内容

Neural Networks

关注 1645

神经网络（Neural Networks）是世界上三个最古老的神经建模学会的档案期刊:国际神经网络学会(INNS)、欧洲神经网络学会(ENNS)和日本神经网络学会(JNNS)。神经网络提供了一个论坛，以发展和培育一个国际社会的学者和实践者感兴趣的所有方面的神经网络和相关方法的计算智能。神经网络欢迎高质量论文的提交，有助于全面的神经网络研究，从行为和大脑建模，学习算法，通过数学和计算分析，系统的工程和技术应用，大量使用神经网络的概念和技术。这一独特而广泛的范围促进了生物和技术研究之间的思想交流，并有助于促进对生物启发的计算智能感兴趣的跨学科社区的发展。因此，神经网络编委会代表的专家领域包括心理学，神经生物学，计算机科学，工程，数学，物理。该杂志发表文章、信件和评论以及给编辑的信件、社论、时事、软件调查和专利信息。文章发表在五个部分之一:认知科学，神经科学，学习系统，数学和计算分析、工程和应用。官网地址：http://dblp.uni-trier.de/db/journals/nn/

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日