失序中偏中编码器:非线性特征选择模型 (Sparse Centroid-Encoder: A Nonlinear Model for Feature Selection)

Autoencoders have been widely used as a nonlinear tool for data dimensionality reduction. While autoencoders don't utilize the label information, Centroid-Encoders (CE)\cite{ghosh2022supervised} use the class label in their learning process. In this study, we propose a sparse optimization using the Centroid-Encoder architecture to determine a minimal set of features that discriminate between two or more classes. The resulting algorithm, Sparse Centroid-Encoder (SCE), extracts discriminatory features in groups using a sparsity inducing $\ell_1$-norm while mapping a point to its class centroid. One key attribute of SCE is that it can extract informative features from a multi-modal data set, i.e., data sets whose classes appear to have multiple clusters. The algorithm is applied to a wide variety of real world data sets, including single-cell data, high dimensional biological data, image data, speech data, and accelerometer sensor data. We compared our method to various state-of-the-art feature selection techniques, including supervised Concrete Autoencoders (SCAE), Feature Selection Network (FsNet), deep feature selection (DFS), Stochastic Gate (STG), and LassoNet. We empirically showed that SCE features often produced better classification accuracy than other methods on sequester test set.

翻译：自动解码器已被广泛用作数据维度减少的非线性工具。虽然自动解码器不使用标签信息, 但 Centrid- Encoders (CE)\ cite{ghosh2022 监督} 在其学习过程中使用类标签。在本研究中, 我们建议使用 Centrid- Encoder 架构进行稀疏优化, 以确定区别于两个或两个以上等级的最低限度特征。由此产生的算法 Sparse Central- Encoder (SCE), 在绘制其类中位点时, 在组中位不使用标签信息, 在组中位点绘制歧视特性 $\ ell_ 1 美元。 SCE 的一个关键属性是, 它可以从多模式数据集中提取信息性特征。也就是说, 类组的数据集似乎具有多个组群。该算法应用于各种各样的真实的世界数据集, 包括单细胞数据、高维生物数据、图像数据、语音数据和加速计传感器数据。我们比较了各种州- 特征选择方法,,, 包括监督的CEOS- AS- AS- AS- AS- AS- AS- AS- AS- 测试 AS- AS- AS- AS- AS- AS- AS- AS- AS- AS- serve- serveel ex ex ex ex ex ex ex ex res- serve ex ex res- servey exstr res- sem res- sembile res- sem res- serveal- serveal- serveal- sion (S) (S) (S) (S) (S) (S) (S) (S) (S) (S) (S) (S) (S- sem est ex ex ex ex ex ex ex ex res res res res res res res res res res res res res res res res res res resem resis res res resis resis resis resis

相关内容

特征选择

关注 5931

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日