Powerhap: 全电模拟特征选择方法 (Powershap: A Power-full Shapley Feature Selection Method)

Feature selection is a crucial step in developing robust and powerful machine learning models. Feature selection techniques can be divided into two categories: filter and wrapper methods. While wrapper methods commonly result in strong predictive performances, they suffer from a large computational complexity and therefore take a significant amount of time to complete, especially when dealing with high-dimensional feature sets. Alternatively, filter methods are considerably faster, but suffer from several other disadvantages, such as (i) requiring a threshold value, (ii) not taking into account intercorrelation between features, and (iii) ignoring feature interactions with the model. To this end, we present powershap, a novel wrapper feature selection method, which leverages statistical hypothesis testing and power calculations in combination with Shapley values for quick and intuitive feature selection. Powershap is built on the core assumption that an informative feature will have a larger impact on the prediction compared to a known random feature. Benchmarks and simulations show that powershap outperforms other filter methods with predictive performances on par with wrapper methods while being significantly faster, often even reaching half or a third of the execution time. As such, powershap provides a competitive and quick algorithm that can be used by various models in different domains. Furthermore, powershap is implemented as a plug-and-play and open-source sklearn component, enabling easy integration in conventional data science pipelines. User experience is even further enhanced by also providing an automatic mode that automatically tunes the hyper-parameters of the powershap algorithm, allowing to use the algorithm without any configuration needed.

翻译：特性选择是开发强大和强大的机器学习模型的关键一步。特性选择技术可以分为两类: 过滤和包装方法。包装方法通常导致强烈的自动预测性性能, 包装方法具有巨大的计算复杂性, 因而需要大量时间来完成, 特别是在处理高维功能集时。或者, 过滤方法要快得多, 但也有其他几个缺点, 例如 (一) 要求一个阈值, (二) 不考虑各功能之间的相互关系, (三) 忽略与模型的特性互动。为此, 我们展示了一种新型的包装特性选择方法, 这是一种新型的管道特性选择方法, 利用统计假设测试和动力计算, 并结合Shapley值进行快速和直观性功能选择。 Powerhap是建立在一个核心假设上, 即信息性特征对预测的影响比已知的随机特性要大得多。基准和模拟表明, 能力优于其他过滤方法, 预测性方法的比预感性性功能更强, 同时又能大大更快, 往往达到半或更精确的管道性功能选择方法。在快速和直观的轨道上, 使用一种快速的计算法系。。使用一种可提供快速、快速和快速操作使用使用和快速操作。

相关内容

特征选择

关注 5931

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日