基于鲁棒自编码器与自适应图学习的无监督特征选择 (Unsupervised Feature Selection via Robust Autoencoder and Adaptive Graph Learning)

Effective feature selection is essential for high-dimensional data analysis and machine learning. Unsupervised feature selection (UFS) aims to simultaneously cluster data and identify the most discriminative features. Most existing UFS methods linearly project features into a pseudo-label space for clustering, but they suffer from two critical limitations: (1) an oversimplified linear mapping that fails to capture complex feature relationships, and (2) an assumption of uniform cluster distributions, ignoring outliers prevalent in real-world data. To address these issues, we propose the Robust Autoencoder-based Unsupervised Feature Selection (RAEUFS) model, which leverages a deep autoencoder to learn nonlinear feature representations while inherently improving robustness to outliers. We further develop an efficient optimization algorithm for RAEUFS. Extensive experiments demonstrate that our method outperforms state-of-the-art UFS approaches in both clean and outlier-contaminated data settings.

翻译：高效的特征选择对于高维数据分析和机器学习至关重要。无监督特征选择（UFS）旨在同时对数据进行聚类并识别最具判别性的特征。现有的大多数UFS方法将特征线性投影到一个伪标签空间进行聚类，但它们存在两个关键局限：（1）过于简化的线性映射无法捕捉复杂的特征关系；（2）假设聚类分布均匀，忽略了现实数据中普遍存在的异常值。为解决这些问题，我们提出了基于鲁棒自编码器的无监督特征选择（RAEUFS）模型，该模型利用深度自编码器学习非线性特征表示，同时本质上提升了对异常值的鲁棒性。我们进一步为RAEUFS开发了一种高效的优化算法。大量实验表明，无论在清洁数据还是受异常值污染的数据场景下，我们的方法均优于当前最先进的UFS方法。

相关内容

特征选择

关注 5936

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

【NeurIPS2024】超越冗余：信息感知的无监督多重图结构学习

专知会员服务

27+阅读 · 2024年9月29日

【CVPR2024】掩码自解码器是有效的多任务视觉通用模型

专知会员服务

20+阅读 · 2024年3月16日

【NeurIPS2023】半监督端到端对比学习用于时间序列分类

专知会员服务

36+阅读 · 2023年10月17日

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

专知会员服务

18+阅读 · 2022年3月19日