With the advent of the big data era, the data quality problem is becoming more and more crucial. Among many factors, data with missing values is one primary issue, and thus developing effective imputation models is a key topic in the research community. Recently, a major research direction is to employ neural network models such as selforganizing mappings or automatic encoders for filling missing values. However, these classical methods can hardly discover correlation features and common features simultaneously among data attributes. Especially,it is a very typical problem for classical autoencoders that they often learn invalid constant mappings, thus dramatically hurting the filling performance. To solve the above problems, we propose and develop a missing-value-filling model based on a feature-fusion-enhanced autoencoder. We first design and incorporate into an autoencoder a hidden layer that consists of de-tracking neurons and radial basis function neurons, which can enhance the ability to learn correlated features and common features. Besides, we develop a missing value filling strategy based on dynamic clustering (MVDC) that is incorporated into an iterative optimization process. This design can enhance the multi-dimensional feature fusion ability and thus improves the dynamic collaborative missing-value-filling performance. The effectiveness of our model is validated by experimental comparisons to many missing-value-filling methods that are tested on seven datasets with different missing rates.
翻译:随着大数据时代的到来,数据质量问题正在变得越来越重要。在许多因素中,缺少值的数据是一个主要问题,因此开发有效的估算模型是研究界的一个关键主题。最近,一个主要研究方向是使用神经网络模型,如自动组织绘图或自动编码器,以填补缺失值。然而,这些古典方法很难同时发现数据属性之间的相互关系和共同特征。特别是,这是古典自动编码器的一个非常典型的问题,他们常常学习无效的常态绘图,从而极大地损害填充性能。为了解决上述问题,我们提议并开发一个缺失值填充模型,这是研究界的一个关键主题。我们首先设计和将一个包含解跟踪神经元和辐射基功能的隐藏层纳入自动编码系统。此外,我们开发一个缺失值填充战略,其基础是动态组合(MVDC),从而极大地伤害了填充性。这一设计可以增强多维值的增值模型,从而通过测试我们缺失的模型,从而改进了我们缺少的变现能力,从而改进了我们不同维度的变现率。