Multi-view unsupervised feature selection has been proven to be efficient in reducing the dimensionality of multi-view unlabeled data with high dimensions. The previous methods assume all of the views are complete. However, in real applications, the multi-view data are often incomplete, i.e., some views of instances are missing, which will result in the failure of these methods. Besides, while the data arrive in form of streams, these existing methods will suffer the issues of high storage cost and expensive computation time. To address these issues, we propose an Incremental Incomplete Multi-view Unsupervised Feature Selection method (I$^2$MUFS) on incomplete multi-view streaming data. By jointly considering the consistent and complementary information across different views, I$^2$MUFS embeds the unsupervised feature selection into an extended weighted non-negative matrix factorization model, which can learn a consensus clustering indicator matrix and fuse different latent feature matrices with adaptive view weights. Furthermore, we introduce the incremental leaning mechanisms to develop an alternative iterative algorithm, where the feature selection matrix is incrementally updated, rather than recomputing on the entire updated data from scratch. A series of experiments are conducted to verify the effectiveness of the proposed method by comparing with several state-of-the-art methods. The experimental results demonstrate the effectiveness and efficiency of the proposed method in terms of the clustering metrics and the computational cost.
翻译:事实证明,多视图、未经监督的特性选择在降低多视图、无标签、高维度的数据的维度方面是有效的。以前的方法假定所有观点都是完整的。然而,在实际应用中,多视图数据往往不完整,即缺少一些实例观点,从而导致这些方法的失败。此外,在数据以流的形式出现时,这些现有方法将遭遇高存储成本和昂贵的计算时间问题。为了解决这些问题,我们建议对不完整的多视图流数据采用递增、不全的多视图、不可监督的功能选择方法(I$2$MUFS ) 。通过共同考虑不同观点的一致和互补信息,多视图数据流数据数据往往不完全不完全,也就是说,多视图数据流数据流中的一些实例选择往往不完全不完全,导致这些方法的失败。此外,这些现有方法将遭遇高存储成本组合指标矩阵和不同的潜在特征矩阵与适应度重量的结合。此外,我们建议采用递增精锐度机制来开发替代的迭代算法,其中的特征选择矩阵是渐进式更新的,而不是通过对最新数据组合的计算方法的成本效益进行一系列的实验,从测试。