Incomplete instances with various missing attributes in many real-world applications have brought challenges to the classification tasks. Missing values imputation methods are often employed to replace the missing values with substitute values. However, this process often separates the imputation and classification, which may lead to inferior performance since label information are often ignored during imputation. Moreover, traditional methods may rely on improper assumptions to initialize the missing values, whereas the unreliability of such initialization might lead to inferior performance. To address these problems, a novel semi-supervised conditional normalizing flow (SSCFlow) is proposed in this paper. SSCFlow explicitly utilizes the label information to facilitate the imputation and classification simultaneously by estimating the conditional distribution of incomplete instances with a novel semi-supervised normalizing flow. Moreover, SSCFlow treats the initialized missing values as corrupted initial imputation and iteratively reconstructs their latent representations with an overcomplete denoising autoencoder to approximate their true conditional distribution. Experiments on real-world datasets demonstrate the robustness and effectiveness of the proposed algorithm.
翻译:在许多现实世界应用中,缺少各种属性的不完整情况给分类任务带来了挑战。缺失的估算方法往往被用来用替代值取代缺失的值。然而,这一过程往往将估算和分类方法分开,这可能导致低效性能,因为在估算过程中,标签信息常常被忽视。此外,传统方法可能依赖不适当的假设来启动缺失的值,而这种初始化的不可靠性可能导致低效。为了解决这些问题,本文件提出了一个新的半监管的有条件正常化流程(SSCFlow ) 。 SSCFlow 明确利用标签信息来便利估算和分类,通过估算新的半监管正常化流程的不完善案例的有条件分布,从而同时为估算和分类提供便利。此外,SSCFCFlow 将初始缺失的值作为腐败的初始估算,反复地重建其潜在表达方式,以过于彻底的解密自动编码来估计其真实的有条件分布。对现实世界数据集的实验表明拟议的算法的稳健性和有效性。