Banzhaf数据:数据估价框架,具有学习技术最强力 (Data Banzhaf: A Data Valuation Framework with Maximal Robustness to Learning Stochasticity) - 专知论文

会员服务 ·

0

稳健性 · Learning · 样本复杂度 · MSR · Shapley value ·

2022 年 6 月 28 日

Data Banzhaf: A Data Valuation Framework with Maximal Robustness to Learning Stochasticity

翻译：Banzhaf数据:数据估价框架,具有学习技术最强力

Tianhao Wang,Ruoxi Jia

This paper studies the robustness of data valuation to noisy model performance scores. Particularly, we find that the inherent randomness of the widely used stochastic gradient descent can cause existing data value notions (e.g., the Shapley value and the Leave-one-out error) to produce inconsistent data value rankings across different runs. To address this challenge, we first pose a formal framework within which one can measure the robustness of a data value notion. We show that the Banzhaf value, a value notion originated from cooperative game theory literature, achieves the maximal robustness among all semivalues -- a class of value notions that satisfy crucial properties entailed by ML applications. We propose an algorithm to efficiently estimate the Banzhaf value based on the Maximum Sample Reuse (MSR) principle. We derive the lower bound sample complexity for Banzhaf value approximation, and we show that our MSR algorithm's sample complexity nearly matches the lower bound. Our evaluation demonstrates that the Banzhaf value outperforms the existing semivalue-based data value notions on several downstream ML tasks such as learning with weighted samples and noisy label detection. Overall, our study suggests that when the underlying ML algorithm is stochastic, the Banzhaf value is a promising alternative to the semivalue-based data value schemes given its computational advantage and ability to robustly differentiate data quality.

翻译：本文研究了数据估值对于吵闹模型性能评分的稳健性。特别是,我们发现,广泛使用的随机性梯度梯度梯度下降的固有随机性可导致现有数据价值概念(例如,沙普利值和放出一差差差差差差差)产生不同运算的数据价值排名不一致。为了应对这一挑战,我们首先提出一个正式框架,在这个框架内,人们可以测量数据价值概念的稳健性。我们显示,Banzhaf值,一种源自合作游戏理论文献的价值概念,在所有半值之间实现最强的稳健性 -- -- 一种满足ML应用程序关键特性的价值概念类别。我们提出一种算法,以便根据最大样品再利用原则高效估计Banzaf值值。我们得出Banzaf值接近值的低约束性抽样复杂性,我们表明,我们的MSR算法的精度复杂性几乎等于较低约束值。我们的评估表明,Banzhaf值比现有的半值数据价值概念更符合若干下游ML任务,例如学习加权样品和焦标签检测等关键特性。我们提出的算算, 当Banzh 的替代数据具有具有可靠的计算优势时,我们的研究显示, 当它具有具有具有可靠的模型的准性能的基值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值为Banz- 。

0

相关内容

稳健性

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

基于纳米发电机的自驱动MEMS/NEMS机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

烯二炔化合物的离子型Bergman环化聚合反应制备共轭聚合物的研究

国家自然科学基金

0+阅读 · 2014年12月31日

Si基薄膜中电控磁效应的研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于多角度遥感反演森林冠层结构参数及在碳循环模型中的应用

国家自然科学基金

0+阅读 · 2013年12月31日

M2L2型水溶性金属-药物配合物的定向合成与抗肿瘤活性研究

国家自然科学基金

0+阅读 · 2013年12月31日

从调控星形胶质细胞活化异质性探讨益肾化浊通络法对多发性硬化髓鞘再生适应性保护效应机制

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

大气气溶胶光学和微物理特性的激光雷达探测

国家自然科学基金

0+阅读 · 2012年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

Data-driven modeling of beam loss in the LHC

Data-driven modeling of beam loss in the LHC

Arxiv

0+阅读 · 2022年8月18日

Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Streaming Data

Arxiv

0+阅读 · 2022年8月18日

Convergence Rates for Stochastic Approximation on a Boundary

Arxiv

0+阅读 · 2022年8月18日

Estimation and Specification Test for Diffusion Models with Stochastic Volatility

Arxiv

0+阅读 · 2022年8月17日

Conformal Inference for Online Prediction with Arbitrary Distribution Shifts

Arxiv

0+阅读 · 2022年8月17日

Semi-Supervised Anomaly Detection Based on Quadratic Multiform Separation

Arxiv

0+阅读 · 2022年8月17日

Semi-supervised Transfer Learning for Evaluation of Model Classification Performance

Arxiv

0+阅读 · 2022年8月16日

Towards Out-Of-Distribution Generalization: A Survey

Arxiv

38+阅读 · 2021年8月31日

Hyperparameter Ensembles for Robustness and Uncertainty Quantification

Arxiv

12+阅读 · 2020年6月24日

Learning to Propagate for Graph Meta-Learning

Arxiv

14+阅读 · 2019年9月11日

VIP会员

文章信息

相关主题

样本复杂度

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

Data-driven modeling of beam loss in the LHC

Data-driven modeling of beam loss in the LHC

Arxiv

0+阅读 · 2022年8月18日

Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Streaming Data

Arxiv

0+阅读 · 2022年8月18日

Convergence Rates for Stochastic Approximation on a Boundary

Arxiv

0+阅读 · 2022年8月18日

Estimation and Specification Test for Diffusion Models with Stochastic Volatility

Arxiv

0+阅读 · 2022年8月17日

Conformal Inference for Online Prediction with Arbitrary Distribution Shifts

Arxiv

0+阅读 · 2022年8月17日

Semi-Supervised Anomaly Detection Based on Quadratic Multiform Separation

Arxiv

0+阅读 · 2022年8月17日

Semi-supervised Transfer Learning for Evaluation of Model Classification Performance

Arxiv

0+阅读 · 2022年8月16日

Towards Out-Of-Distribution Generalization: A Survey

Arxiv

38+阅读 · 2021年8月31日

Hyperparameter Ensembles for Robustness and Uncertainty Quantification

Arxiv

12+阅读 · 2020年6月24日

Learning to Propagate for Graph Meta-Learning

Arxiv

14+阅读 · 2019年9月11日

相关基金

基于纳米发电机的自驱动MEMS/NEMS机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

烯二炔化合物的离子型Bergman环化聚合反应制备共轭聚合物的研究

国家自然科学基金

0+阅读 · 2014年12月31日

Si基薄膜中电控磁效应的研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于多角度遥感反演森林冠层结构参数及在碳循环模型中的应用

国家自然科学基金

0+阅读 · 2013年12月31日

M2L2型水溶性金属-药物配合物的定向合成与抗肿瘤活性研究

国家自然科学基金

0+阅读 · 2013年12月31日

从调控星形胶质细胞活化异质性探讨益肾化浊通络法对多发性硬化髓鞘再生适应性保护效应机制

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

大气气溶胶光学和微物理特性的激光雷达探测

国家自然科学基金

0+阅读 · 2012年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员