Online streaming feature selection (OSFS), which conducts feature selection in an online manner, plays an important role in dealing with high-dimensional data. In many real applications such as intelligent healthcare platform, streaming feature always has some missing data, which raises a crucial challenge in conducting OSFS, i.e., how to establish the uncertain relationship between sparse streaming features and labels. Unfortunately, existing OSFS algorithms never consider such uncertain relationship. To fill this gap, we in this paper propose an online sparse streaming feature selection with uncertainty (OS2FSU) algorithm. OS2FSU consists of two main parts: 1) latent factor analysis is utilized to pre-estimate the missing data in sparse streaming features before con-ducting feature selection, and 2) fuzzy logic and neighborhood rough set are employed to alleviate the uncertainty between estimated streaming features and labels during conducting feature selection. In the experiments, OS2FSU is compared with five state-of-the-art OSFS algorithms on six real datasets. The results demonstrate that OS2FSU outperforms its competitors when missing data are encountered in OSFS.
翻译:在线流体特征选择(OSFS)以在线方式进行特征选择,在处理高维数据方面发挥着重要作用。在许多真正的应用中,如智能保健平台,流体特征总是有一些缺失的数据,这在进行流体特征选择和标签时提出了关键的挑战,即如何确定稀疏流体特征和标签之间的不确定关系。不幸的是,现有的OSFS算法从未考虑过这种不确定关系。为了填补这一空白,我们本文建议采用带有不确定性算法(OS2FSU)的在线流体特征选择。 OS2FSU由两个主要部分组成:1) 利用潜在要素分析来预先估计稀散流体特征中缺失的数据;2) 使用模糊的逻辑和周围粗略的组合来减轻在进行特征选择期间估计流体特征和标签之间的不确定性。在实验中,OS2FSU与六个真实数据集的5种状态的OSFSU算法进行了比较。结果显示,OS2FSU在缺少数据时,其竞争者在缺少数据时,其竞争者比了OS2FSU。