无模型高维数据变量重要性的研究 (Model free variable importance for high dimensional data) - 专知论文

会员服务 ·

0

高维数据 · 高维 · 计算机科学 · 模型方法 · 输入空间 ·

2023 年 4 月 20 日

Model free variable importance for high dimensional data

翻译：无模型高维数据变量重要性的研究

Naofumi Hama,Masayoshi Mase,Art B. Owen

A model-agnostic variable importance method can be used with arbitrary prediction functions. Here we present some model-free methods that do not require access to the prediction function. This is useful when that function is proprietary and not available, or just extremely expensive. It is also useful when studying residuals from a model. The cohort Shapley (CS) method is model-free but has exponential cost in the dimension of the input space. A supervised on-manifold Shapley method from Frye et al. (2020) is also model free but requires as input a second black box model that has to be trained for the Shapley value problem. We introduce an integrated gradient (IG) version of cohort Shapley, called IGCS, with cost $\mathcal{O}(nd)$. We show that over the vast majority of the relevant unit cube that the IGCS value function is close to a multilinear function for which IGCS matches CS. Another benefit of IGCS is that is allows IG methods to be used with binary predictors. We use some area between curves (ABC) measures to quantify the performance of IGCS. On a problem from high energy physics we verify that IGCS has nearly the same ABCs as CS does. We also use it on a problem from computational chemistry in 1024 variables. We see there that IGCS attains much higher ABCs than we get from Monte Carlo sampling. The code is publicly available at https://github.com/cohortshapley/cohortintgrad

翻译：摘要：模型无关的变量重要性方法可以与任意预测函数一起使用。我们在这里介绍了一些无需访问预测函数的无模型方法。当函数是专有的且不可用或成本非常高时，这将非常有用。当从模型中研究残差时，这也非常有用。Cohort Shapley（CS）方法是无模型的，但在输入空间维数上的成本是指数级的。Frye等人（2020）提出的有监督曲面Shapley方法也是无模型的，但需要第二个黑匣子模型作为Shapley值问题的输入。我们引入了一种名为IGCS的集成梯度版本的Cohort Shapley，其成本为$\mathcal{O}(nd)$。我们证明，对于绝大多数相关单位立方体，IGCS值函数接近于多线性函数，其中IGCS与CS匹配。IGCS的另一个好处是它允许使用二进制预测器的IG方法。我们使用一些曲线之间的面积（ABC）度量来量化IGCS的性能。在来自高能物理学的问题上，我们验证了IGCS的ABC几乎与CS相同。我们还将其应用于1024个变量的计算化学问题。我们发现，与蒙特卡罗采样得到的ABC相比，IGCS达到了更高的ABC。代码可在https://github.com/cohortshapley/cohortintgrad上公开获取。

0

相关内容

高维数据

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

机器学习损失函数概述，Loss Functions in Machine Learning

机器学习损失函数概述，Loss Functions in Machine Learning

专知会员服务

83+阅读 · 2022年3月19日

【Max Welling】图神经网络知识表示与推荐，Graph Neural Networks for Knowledge Representation and Recommendation

【Max Welling】图神经网络知识表示与推荐，Graph Neural Networks for Knowledge Representation and Recommendation

专知会员服务

44+阅读 · 2022年3月4日

知识图谱嵌入模型的概率标定,Probability Calibration for Knowledge Graph Embedding Models

专知会员服务

36+阅读 · 2020年5月11日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

【SIGMOD2020】知识图谱补全方法的现实再评价，Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study

【SIGMOD2020】知识图谱补全方法的现实再评价，Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study

专知会员服务

33+阅读 · 2020年3月23日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

特征筛选还在用XGB的Feature Importance？试试Permutation Importance

特征筛选还在用XGB的Feature Importance？试试Permutation Importance

PaperWeekly

0+阅读 · 2022年9月30日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

高维数据下多因变量回归模型的统计推断

国家自然科学基金

5+阅读 · 2013年12月31日

基于广义半参数回归模型的统计推断及其应用研究

国家自然科学基金

2+阅读 · 2013年12月31日

变系数微分方程的谱方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

删失数据中位数回归模型的统计分析

国家自然科学基金

3+阅读 · 2012年12月31日

相依样本下的经验似然推断

国家自然科学基金

0+阅读 · 2012年12月31日

高维数据的图模型学习与统计推断

国家自然科学基金

8+阅读 · 2012年12月31日

约束优化方法及其在图像恢复中的应用

国家自然科学基金

0+阅读 · 2012年12月31日

稳健且有效的回归和变量选择方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

数据缺失时高维数据降维分析的方法、理论与应用

国家自然科学基金

1+阅读 · 2011年12月31日

相依与不完全数据的统计推断及其应用研究

国家自然科学基金

0+阅读 · 2008年12月31日

High-dimensional imputation for the social sciences: a comparison of state-of-the-art methods

Arxiv

0+阅读 · 2023年6月6日

HireVAE: An Online and Adaptive Factor Model Based on Hierarchical and Regime-Switch VAE

Arxiv

0+阅读 · 2023年6月5日

Optimal Rate-Matrix Pruning For Large-Scale Heterogeneous Systems

Arxiv

0+阅读 · 2023年6月2日

Model-Free Error Assessment for Breadth-First Studies, with Applications to Cell-Perturbation Experiments

Arxiv

0+阅读 · 2023年6月2日

Measuring Consistency in Text-based Financial Forecasting Models

Arxiv

0+阅读 · 2023年6月2日

Robust Recursive Filtering and Smoothing

Arxiv

0+阅读 · 2023年5月31日

Updating Embeddings for Dynamic Knowledge Graphs

Arxiv

20+阅读 · 2021年9月22日

Disentangled Information Bottleneck

Disentangled Information Bottleneck

Arxiv

12+阅读 · 2020年12月22日

Financial Time Series Representation Learning

Financial Time Series Representation Learning

Arxiv

10+阅读 · 2020年3月27日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

VIP会员

文章信息

相关主题

计算机科学

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

机器学习损失函数概述，Loss Functions in Machine Learning

机器学习损失函数概述，Loss Functions in Machine Learning

专知会员服务

83+阅读 · 2022年3月19日

【Max Welling】图神经网络知识表示与推荐，Graph Neural Networks for Knowledge Representation and Recommendation

【Max Welling】图神经网络知识表示与推荐，Graph Neural Networks for Knowledge Representation and Recommendation

专知会员服务

44+阅读 · 2022年3月4日

知识图谱嵌入模型的概率标定,Probability Calibration for Knowledge Graph Embedding Models

专知会员服务

36+阅读 · 2020年5月11日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

【SIGMOD2020】知识图谱补全方法的现实再评价，Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study

【SIGMOD2020】知识图谱补全方法的现实再评价，Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study

专知会员服务

33+阅读 · 2020年3月23日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《电磁（电子）战：英国能力》最新32页报告

《美军条令：斯特赖克步兵步枪排与班作战条令》最新450页

《美海军分布式海上作战（DMO）概念：最新情况》

《跨时空与跨模态学习事件模式构建体系（LESTAT）》57页DARPA研究报告

相关资讯

特征筛选还在用XGB的Feature Importance？试试Permutation Importance

特征筛选还在用XGB的Feature Importance？试试Permutation Importance

PaperWeekly

0+阅读 · 2022年9月30日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

High-dimensional imputation for the social sciences: a comparison of state-of-the-art methods

Arxiv

0+阅读 · 2023年6月6日

HireVAE: An Online and Adaptive Factor Model Based on Hierarchical and Regime-Switch VAE

Arxiv

0+阅读 · 2023年6月5日

Optimal Rate-Matrix Pruning For Large-Scale Heterogeneous Systems

Arxiv

0+阅读 · 2023年6月2日

Model-Free Error Assessment for Breadth-First Studies, with Applications to Cell-Perturbation Experiments

Arxiv

0+阅读 · 2023年6月2日

Measuring Consistency in Text-based Financial Forecasting Models

Arxiv

0+阅读 · 2023年6月2日

Robust Recursive Filtering and Smoothing

Arxiv

0+阅读 · 2023年5月31日

Updating Embeddings for Dynamic Knowledge Graphs

Arxiv

20+阅读 · 2021年9月22日

Disentangled Information Bottleneck

Disentangled Information Bottleneck

Arxiv

12+阅读 · 2020年12月22日

Financial Time Series Representation Learning

Financial Time Series Representation Learning

Arxiv

10+阅读 · 2020年3月27日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

相关基金

高维数据下多因变量回归模型的统计推断

国家自然科学基金

5+阅读 · 2013年12月31日

基于广义半参数回归模型的统计推断及其应用研究

国家自然科学基金

2+阅读 · 2013年12月31日

变系数微分方程的谱方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

删失数据中位数回归模型的统计分析

国家自然科学基金

3+阅读 · 2012年12月31日

相依样本下的经验似然推断

国家自然科学基金

0+阅读 · 2012年12月31日

高维数据的图模型学习与统计推断

国家自然科学基金

8+阅读 · 2012年12月31日

约束优化方法及其在图像恢复中的应用

国家自然科学基金

0+阅读 · 2012年12月31日

稳健且有效的回归和变量选择方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

数据缺失时高维数据降维分析的方法、理论与应用

国家自然科学基金

1+阅读 · 2011年12月31日

相依与不完全数据的统计推断及其应用研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员