Multi-view unsupervised feature selection (MUFS) has recently received increasing attention for its promising ability in dimensionality reduction on multi-view unlabeled data. Existing MUFS methods typically select discriminative features by capturing correlations between features and clustering labels. However, an important yet underexplored question remains: \textit{Are such correlations sufficiently reliable to guide feature selection?} In this paper, we analyze MUFS from a causal perspective by introducing a novel structural causal model, which reveals that existing methods may select irrelevant features because they overlook spurious correlations caused by confounders. Building on this causal perspective, we propose a novel MUFS method called CAusal multi-view Unsupervised feature Selection leArning (CAUSA). Specifically, we first employ a generalized unsupervised spectral regression model that identifies informative features by capturing dependencies between features and consensus clustering labels. We then introduce a causal regularization module that can adaptively separate confounders from multi-view data and simultaneously learn view-shared sample weights to balance confounder distributions, thereby mitigating spurious correlations. Thereafter, integrating both into a unified learning framework enables CAUSA to select causally informative features. Comprehensive experiments demonstrate that CAUSA outperforms several state-of-the-art methods. To our knowledge, this is the first in-depth study of causal multi-view feature selection in the unsupervised setting.
翻译:多视图无监督特征选择(MUFS)因其在多视图无标签数据降维方面的显著潜力,近年来受到越来越多的关注。现有的MUFS方法通常通过捕捉特征与聚类标签之间的相关性来选择判别性特征。然而,一个重要但尚未充分探讨的问题依然存在:\textit{这种相关性是否足够可靠以指导特征选择?}本文从因果视角出发,通过引入一种新颖的结构因果模型来分析MUFS,该模型揭示了现有方法可能因忽略由混杂因子引起的伪相关性而选择无关特征。基于这一因果视角,我们提出了一种名为因果多视图无监督特征选择学习(CAUSA)的新方法。具体而言,我们首先采用广义无监督谱回归模型,通过捕捉特征与共识聚类标签之间的依赖关系来识别信息性特征。随后,我们引入一个因果正则化模块,该模块能够自适应地从多视图数据中分离混杂因子,并同时学习视图共享的样本权重以平衡混杂因子分布,从而减轻伪相关性。此后,将两者整合到一个统一的学习框架中,使CAUSA能够选择因果信息性特征。综合实验表明,CAUSA在性能上优于多种最先进方法。据我们所知,这是首次在无监督设置下对因果多视图特征选择进行的深入研究。