混合数据有条件特点重要性 (Conditional Feature Importance for Mixed Data) - 专知论文

会员服务 ·

0

统计量 · 边缘化 · 控制器 · Continuity · 分类数据 ·

2022 年 10 月 6 日

Conditional Feature Importance for Mixed Data

翻译：混合数据有条件特点重要性

Kristin Blesch,David S. Watson,Marvin N. Wright

Despite the popularity of feature importance measures in interpretable machine learning, the statistical adequacy of these methods is rarely discussed. From a statistical perspective, a major distinction is between analyzing a variable's importance before and after adjusting for covariates - i.e., between marginal and conditional measures. Our work draws attention to this rarely acknowledged, yet crucial distinction and showcases its implications. Further, we reveal that for testing conditional feature importance (CFI), only few methods are available and practitioners have hitherto been severely restricted in method application due to mismatching data requirements. Most real-world data exhibits complex feature dependencies and incorporates both continuous and categorical data (mixed data). Both properties are oftentimes neglected by CFI measures. To fill this gap, we propose to combine the conditional predictive impact (CPI) framework (arXiv:1901.09917) with sequential knockoff sampling (arXiv:2010.14026). The CPI enables CFI measurement that controls for any feature dependencies by sampling valid knockoffs - hence, generating synthetic data with similar statistical properties - for the data to be analyzed. Sequential knockoffs were deliberately designed to handle mixed data and thus allow us to extend the CPI approach to such datasets. We demonstrate through numerous simulations and a real-world example that our proposed workflow controls type I error, achieves high power and is in line with results given by other CFI measures, whereas marginal feature importance metrics result in misleading interpretations. Our findings highlight the necessity of developing statistically adequate, specialized methods for mixed data.

翻译：尽管在可解释的机器学习中采用具有重要特点的措施很受欢迎,但这些方法的统计充分性却很少讨论。从统计角度看,主要区别在于分析变量在调整共同变数前后的重要性,即边际和有条件措施。我们的工作提请人们注意这一点很少被承认,但关键的区别和说明其影响。此外,我们发现,对于测试有条件特征的重要性(CFI),由于数据要求的不匹配,只有很少的方法和从业者在方法应用方面受到严格限制。大多数真实世界数据都显示出复杂的特征,并包含连续和绝对的数据(混合数据)。两种属性往往被CFI措施忽视。为了填补这一差距,我们提议将有条件的预测效果(CPI)框架(arXiv:190/9917)与连续的敲击抽样(arXiv:2010.14026)结合起来。CPI通过取样有效的欺骗(因此产生具有类似统计特性的合成数据),使CFIFI对数据进行控制的方法非常复杂和明确性数据分析。在CFI调查中有意设计具有决定性的特点,因此,我们通过模拟式数据模型将这种结果扩大到模拟,我们采用其他方法。

0

相关内容

统计量

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

同型半胱氨酸经ERK通路上调ETB受体表达促血管平滑肌细胞增殖机制

国家自然科学基金

0+阅读 · 2015年12月31日

SirT1介导硫化氢对自发性高血压大鼠血管平滑肌细胞增殖的抑制效应

国家自然科学基金

0+阅读 · 2015年12月31日

癸基泛醌上调BAI1抑制肿瘤血管形成的作用及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

炎症调节因子c-REL抑制TAp73的抗骨肉瘤作用的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Catestatin蛋白肽段抑制动脉粥样硬化的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

GLP-1通过microRNA信号分子网络影响胰腺的增殖和凋亡

国家自然科学基金

0+阅读 · 2012年12月31日

TWIST在胃癌多药耐药中的作用及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

视黄酸诱导神经干细胞分化过程中蛋白磷酸化的动态研究

国家自然科学基金

0+阅读 · 2012年12月31日

EphA2介导的细胞自噬在鼻咽癌紫杉醇耐药中的作用及分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

胃癌17q21.33缺失区内候选基因TOB1功能及其失活机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

A Linear Time Algorithm for the Optimal Discrete IRS Beamforming

Arxiv

0+阅读 · 2022年11月9日

Adaptive Semantic Communications: Overfitting the Source and Channel for Profit

Arxiv

0+阅读 · 2022年11月8日

Doubly Robust Augmented Model Accuracy Transfer Inference with High Dimensional Features

Arxiv

0+阅读 · 2022年11月8日

Structured Mixture of Continuation-ratio Logits Models for Ordinal Regression

Arxiv

0+阅读 · 2022年11月8日

Generative Models as a Data Source for Multiview Representation Learning

Arxiv

16+阅读 · 2021年6月9日

Adversarial and Contrastive Variational Autoencoder for Sequential Recommendation

Arxiv

17+阅读 · 2021年3月19日

Reinforced Negative Sampling over Knowledge Graph for Recommendation

Arxiv

17+阅读 · 2020年3月12日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

f-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning

Arxiv

11+阅读 · 2019年3月25日

Feature Denoising for Improving Adversarial Robustness

Feature Denoising for Improving Adversarial Robustness

Arxiv

15+阅读 · 2018年12月9日

VIP会员

文章信息

相关主题

相关VIP内容

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

用于无人机的C波段空地通信系统研究 | 2025最新116页

甚高频军事战术通信系统传播性能分析研究

军事通信系统：安全行动的支柱

卫星与地面通信系统：美陆军面临的空间与电子战局势 | 39页报告

相关资讯

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

A Linear Time Algorithm for the Optimal Discrete IRS Beamforming

Arxiv

0+阅读 · 2022年11月9日

Adaptive Semantic Communications: Overfitting the Source and Channel for Profit

Arxiv

0+阅读 · 2022年11月8日

Doubly Robust Augmented Model Accuracy Transfer Inference with High Dimensional Features

Arxiv

0+阅读 · 2022年11月8日

Structured Mixture of Continuation-ratio Logits Models for Ordinal Regression

Arxiv

0+阅读 · 2022年11月8日

Generative Models as a Data Source for Multiview Representation Learning

Arxiv

16+阅读 · 2021年6月9日

Adversarial and Contrastive Variational Autoencoder for Sequential Recommendation

Arxiv

17+阅读 · 2021年3月19日

Reinforced Negative Sampling over Knowledge Graph for Recommendation

Arxiv

17+阅读 · 2020年3月12日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

f-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning

Arxiv

11+阅读 · 2019年3月25日

Feature Denoising for Improving Adversarial Robustness

Feature Denoising for Improving Adversarial Robustness

Arxiv

15+阅读 · 2018年12月9日

相关基金

同型半胱氨酸经ERK通路上调ETB受体表达促血管平滑肌细胞增殖机制

国家自然科学基金

0+阅读 · 2015年12月31日

SirT1介导硫化氢对自发性高血压大鼠血管平滑肌细胞增殖的抑制效应

国家自然科学基金

0+阅读 · 2015年12月31日

癸基泛醌上调BAI1抑制肿瘤血管形成的作用及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

炎症调节因子c-REL抑制TAp73的抗骨肉瘤作用的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Catestatin蛋白肽段抑制动脉粥样硬化的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

GLP-1通过microRNA信号分子网络影响胰腺的增殖和凋亡

国家自然科学基金

0+阅读 · 2012年12月31日

TWIST在胃癌多药耐药中的作用及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

视黄酸诱导神经干细胞分化过程中蛋白磷酸化的动态研究

国家自然科学基金

0+阅读 · 2012年12月31日

EphA2介导的细胞自噬在鼻咽癌紫杉醇耐药中的作用及分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

胃癌17q21.33缺失区内候选基因TOB1功能及其失活机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员