UAFS: 缺少数据问题不确定性软件特征选择 (UAFS: Uncertainty-Aware Feature Selection for Problems with Missing Data)

Missing data are a concern in many real world data sets and imputation methods are often needed to estimate the values of missing data, but data sets with excessive missingness and high dimensionality challenge most approaches to imputation. Here we show that appropriate feature selection can be an effective preprocessing step for imputation, allowing for more accurate imputation and subsequent model predictions. The key feature of this preprocessing is that it incorporates uncertainty: by accounting for uncertainty due to missingness when selecting features we can reduce the degree of missingness while also limiting the number of uninformative features being used to make predictive models. We introduce a method to perform uncertainty-aware feature selection (UAFS), provide a theoretical motivation, and test UAFS on both real and synthetic problems, demonstrating that across a variety of data sets and levels of missingness we can improve the accuracy of imputations. Improved imputation due to UAFS also results in improved prediction accuracy when performing supervised learning using these imputed data sets. Our UAFS method is general and can be fruitfully coupled with a variety of imputation methods.

翻译：许多真实世界的数据集都关注缺失的数据,而且往往需要估算方法来估计缺失数据的价值,但数据组缺损过多和高度维度高对多数估算方法提出了挑战。我们在这里表明,适当的特征选择可能是估算的有效预处理步骤,可以进行更准确的估算和随后的模型预测。这一预处理的关键特征是包含不确定性:在选择特征时,通过计算缺失的不确定性,我们可以降低缺失程度,同时限制用于制作预测模型的非信息性特征的数量。我们采用了一种方法,进行不确定性特征选择(UAFS),提供理论动力,并在实际和合成问题上测试UAFS,表明在各种数据集和缺失程度上,我们可以提高估算的准确性。由于UAFS的改进,在使用这些估算数据集进行有监督的学习时,预测的准确性也会提高。我们的UAFS方法很笼统,并且可以与各种估算方法相结合。

相关内容

特征选择

关注 5931

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

AAAI2021 | 图神经网络的异质图结构学习，Heterogeneous Graph Structure Learning for Graph Neural Networks

专知会员服务

92+阅读 · 2021年1月20日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

因果图，Causal Graphs，52页ppt

专知会员服务

252+阅读 · 2020年4月19日