Incomplete instances with various missing attributes in many real-world scenes have brought challenges to the classification task. There are some missing values imputation methods to fill the missing values with substitute values before classification. However, the separation between imputation and classification may lead to inferior performance since label information are ignored during imputation. Moreover, these imputation methods tend to initialize these missing values with strong prior assumptions, while the unreliability of such initialization is rarely considered. To tackle these problems, a novel semi-supervised conditional normalizing flow (SSCFlow) is proposed in this paper. SSCFlow explicitly utilizes the observed labels to facilitate the imputation and classification simultaneously by employing a semi-supervised algorithm to estimate the conditional probability density of missing values. Moreover, SSCFlow takes the initialized missing values as corrupted initial imputation and iteratively reconstructs their latent representations with an overcomplete denoising autoencoder to approximate the true conditional probability density of missing values. Experiments have been conducted with real-world datasets to demonstrate the robustness and efficiency of the proposed algorithm.
翻译:许多真实世界场景中存在各种缺失属性的不完整情况,给分类工作带来了挑战。有些缺失的估算方法在分类前以替代值填充缺失的值。然而,估算和分类之间的分离可能导致性能低下,因为估算过程中忽略了标签信息。此外,这些估算方法往往以强烈的先前假设开始这些缺失的值,而这种初始化的不可靠性很少得到考虑。为了解决这些问题,本文件提出了一个新的半监督的有条件正常化流程(SSCFlow) 。 SSCFlow 明确利用观察到的标签促进估算和分类,同时使用半监督的算法估计缺失值的有条件概率密度。 此外, SSCFCFlow 将初始缺失值视为腐败的初始估算,并反复重塑其潜在表达方式,以过于彻底的脱色自动编码来估计缺失值的真正有条件的概率密度。 已经与现实世界数据集进行了实验,以显示拟议算法的可靠性和效率。