A-SFS:基于多任务自监督的半监督地物选择 (A-SFS: Semi-supervised Feature Selection based on Multi-task Self-supervision)

Feature selection is an important process in machine learning. It builds an interpretable and robust model by selecting the features that contribute the most to the prediction target. However, most mature feature selection algorithms, including supervised and semi-supervised, fail to fully exploit the complex potential structure between features. We believe that these structures are very important for the feature selection process, especially when labels are lacking and data is noisy. To this end, we innovatively introduce a deep learning-based self-supervised mechanism into feature selection problems, namely batch-Attention-based Self-supervision Feature Selection(A-SFS). Firstly, a multi-task self-supervised autoencoder is designed to uncover the hidden structure among features with the support of two pretext tasks. Guided by the integrated information from the multi-self-supervised learning model, a batch-attention mechanism is designed to generate feature weights according to batch-based feature selection patterns to alleviate the impacts introduced by a handful of noisy data. This method is compared to 14 major strong benchmarks, including LightGBM and XGBoost. Experimental results show that A-SFS achieves the highest accuracy in most datasets. Furthermore, this design significantly reduces the reliance on labels, with only 1/10 labeled data needed to achieve the same performance as those state of art baselines. Results show that A-SFS is also most robust to the noisy and missing data.

翻译：机器学习中的一个重要过程是选择特征选择。它通过选择最有助于预测目标的特征, 构建了一个可以解释和稳健的模型。但是, 最成熟的特征选择算法, 包括监督的和半监督的功能选择算法, 未能充分利用各种特征之间的复杂潜在结构。我们认为, 这些结构对于特征选择过程非常重要, 特别是当标签缺乏和数据吵闹的时候。为此, 我们创新地将基于深层次学习的自我监督机制引入特征选择问题, 即基于批量的基于自我监督的自我监督的特征选择( A- SFS) 。首先, 多任务自我监督的自我监督自动编码器设计, 目的是在两种托辞任务的支持下发现各特征之间的隐藏结构。我们认为, 这些结构结构对于功能选择过程非常重要, 特别是当缺少标签时, 批量使用机制生成基于批量特征的选择模式的特征选择模式, 以减轻少数杂乱数据带来的影响。这种方法被比作14个主要强的基准, 包括 LightGBM 和 XGBOost。实验结果显示, A-SFSFS 最可靠的数据在最可靠数据设计中也显示, 最可靠的数据标准中, 也大大降低了这些标签所需的数据要求。

相关内容

特征选择

关注 5933

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【图深度学习GDL论文大全】A comprehensive collection of recent papers on graph deep learning

专知会员服务

47+阅读 · 2019年12月1日