将部分依赖性绘图和变异特点与数据生成进程相关的重要性 (Relating the Partial Dependence Plot and Permutation Feature Importance to the Data Generating Process) - 专知论文

会员服务 ·

0

数据生成过程 · Processing（编程语言） · MoDELS · 统计量 · 估计/估计量 ·

2021 年 9 月 3 日

Relating the Partial Dependence Plot and Permutation Feature Importance to the Data Generating Process

翻译：将部分依赖性绘图和变异特点与数据生成进程相关的重要性

Christoph Molnar,Timo Freiesleben,Gunnar König,Giuseppe Casalicchio,Marvin N. Wright,Bernd Bischl

Scientists and practitioners increasingly rely on machine learning to model data and draw conclusions. Compared to statistical modeling approaches, machine learning makes fewer explicit assumptions about data structures, such as linearity. However, their model parameters usually cannot be easily related to the data generating process. To learn about the modeled relationships, partial dependence (PD) plots and permutation feature importance (PFI) are often used as interpretation methods. However, PD and PFI lack a theory that relates them to the data generating process. We formalize PD and PFI as statistical estimators of ground truth estimands rooted in the data generating process. We show that PD and PFI estimates deviate from this ground truth due to statistical biases, model variance and Monte Carlo approximation errors. To account for model variance in PD and PFI estimation, we propose the learner-PD and the learner-PFI based on model refits, and propose corrected variance and confidence interval estimators.

翻译：科学家和从业者越来越多地依靠机器学习来模拟数据和得出结论。与统计模型方法相比,机器学习对数据结构(如线性)的精确假设较少。然而,其模型参数通常不易与数据生成过程相联系。要了解模型关系、部分依赖(PD)地块和变相特征的重要性(PFI)往往被用作解释方法。然而,PD和PFI缺乏与数据生成过程相关的理论。我们正式确定PD和PFI为数据生成过程中根植于地面真理估计的统计估计员。我们表明,PD和PFI估计的地面真理偏离了这一地面真理,因为统计偏差、模型差异和蒙特卡洛近似错误。为了计算PD和PFI估计中的模型差异,我们建议学习者-PD和学习者-PFI以模型为基础进行修改,并提议纠正差异和信任间隔估计员。

0

相关内容

数据生成过程

数据生成过程

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【伯克利】机器学习蛋白质工程，Machine learning for protein engineering，83页ppt

【伯克利】机器学习蛋白质工程，Machine learning for protein engineering，83页ppt

专知会员服务

36+阅读 · 2020年5月9日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

深度学习生物图像重建综述，Deep Learning for Biomedical Image Reconstruction: A Survey

深度学习生物图像重建综述，Deep Learning for Biomedical Image Reconstruction: A Survey

专知会员服务

40+阅读 · 2020年3月2日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

已删除

将门创投

5+阅读 · 2018年11月27日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Minimax rates for sparse signal detection under correlation

Arxiv

0+阅读 · 2021年10月25日

Local Independence Testing for Point Processes

Arxiv

0+阅读 · 2021年10月25日

Imputation of Missing Data Using Linear Gaussian Cluster-Weighted Modeling

Arxiv

0+阅读 · 2021年10月24日

Detecting model drift using polynomial relations

Arxiv

0+阅读 · 2021年10月24日

Strongly minimal self-conjugate linearizations for polynomial and rational matrices

Arxiv

0+阅读 · 2021年10月24日

De Novo Molecular Generation with Stacked Adversarial Model

Arxiv

0+阅读 · 2021年10月24日

Reconciling design-based and model-based causal inferences for split-plot experiments

Arxiv

0+阅读 · 2021年10月22日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

Learning to Importance Sample in Primary Sample Space

Learning to Importance Sample in Primary Sample Space

Arxiv

5+阅读 · 2018年8月23日

Large-Scale Stochastic Sampling from the Probability Simplex

Arxiv

3+阅读 · 2018年6月19日

VIP会员

文章信息

相关主题

数据生成过程

Processing（编程语言）

估计/估计量

相关VIP内容

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【伯克利】机器学习蛋白质工程，Machine learning for protein engineering，83页ppt

【伯克利】机器学习蛋白质工程，Machine learning for protein engineering，83页ppt

专知会员服务

36+阅读 · 2020年5月9日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

深度学习生物图像重建综述，Deep Learning for Biomedical Image Reconstruction: A Survey

深度学习生物图像重建综述，Deep Learning for Biomedical Image Reconstruction: A Survey

专知会员服务

40+阅读 · 2020年3月2日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

已删除

将门创投

5+阅读 · 2018年11月27日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Minimax rates for sparse signal detection under correlation

Arxiv

0+阅读 · 2021年10月25日

Local Independence Testing for Point Processes

Arxiv

0+阅读 · 2021年10月25日

Imputation of Missing Data Using Linear Gaussian Cluster-Weighted Modeling

Arxiv

0+阅读 · 2021年10月24日

Detecting model drift using polynomial relations

Arxiv

0+阅读 · 2021年10月24日

Strongly minimal self-conjugate linearizations for polynomial and rational matrices

Arxiv

0+阅读 · 2021年10月24日

De Novo Molecular Generation with Stacked Adversarial Model

Arxiv

0+阅读 · 2021年10月24日

Reconciling design-based and model-based causal inferences for split-plot experiments

Arxiv

0+阅读 · 2021年10月22日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

Learning to Importance Sample in Primary Sample Space

Learning to Importance Sample in Primary Sample Space

Arxiv

5+阅读 · 2018年8月23日

Large-Scale Stochastic Sampling from the Probability Simplex

Arxiv

3+阅读 · 2018年6月19日

微信扫码咨询专知VIP会员