设计利用仿真用户研究评估机器学习解释的案例研究 (A Case Study on Designing Evaluations of ML Explanations with Simulated User Studies) - 专知论文

会员服务 ·

0

CASE · 原点 · Use Case · MoDELS · CASES ·

2023 年 3 月 20 日

A Case Study on Designing Evaluations of ML Explanations with Simulated User Studies

翻译：设计利用仿真用户研究评估机器学习解释的案例研究

Ada Martin,Valerie Chen,Sérgio Jesus,Pedro Saleiro

from arxiv, 9 pages, 2 figures. Will appear in ICLR 2023's TrustML-(un)Limited workshop

When conducting user studies to ascertain the usefulness of model explanations in aiding human decision-making, it is important to use real-world use cases, data, and users. However, this process can be resource-intensive, allowing only a limited number of explanation methods to be evaluated. Simulated user evaluations (SimEvals), which use machine learning models as a proxy for human users, have been proposed as an intermediate step to select promising explanation methods. In this work, we conduct the first SimEvals on a real-world use case to evaluate whether explanations can better support ML-assisted decision-making in e-commerce fraud detection. We study whether SimEvals can corroborate findings from a user study conducted in this fraud detection context. In particular, we find that SimEvals suggest that all considered explainers are equally performant, and none beat a baseline without explanations -- this matches the conclusions of the original user study. Such correspondences between our results and the original user study provide initial evidence in favor of using SimEvals before running user studies. We also explore the use of SimEvals as a cheap proxy to explore an alternative user study set-up. We hope that this work motivates further study of when and how SimEvals should be used to aid in the design of real-world evaluations.

翻译：当进行用户研究以确定模型解释在辅助人类决策方面的有用性时，使用真实的使用案例、数据和用户非常重要。然而，这个过程需要大量的资源，只能评估有限数量的解释方法。仿真用户评估（SimEvals）使用机器学习模型作为人类用户的代理被提出作为选择有前途的解释方法的中间步骤。在这项工作中，我们在一个真实案例中进行了第一次的SimEvals，以评估解释是否能更好地支持电子商务欺诈检测中的机器学习辅助决策。我们研究了SimEvals是否可以证实在这个欺诈检测环境下进行的用户研究的发现。特别是，我们发现SimEvals表明，所有考虑的解释方法的性能均相等，并且没有比没有解释的基线更好--这与原始用户研究的结论相符。我们的结果与原始用户研究的这种对应关系为使用SimEvals在进行用户研究之前提供了初步的支持。我们还探讨了SimEvals作为廉价代理人以探索替代用户研究设置的使用。我们希望这项工作鼓励进一步研究何时以及如何使用SimEvals来帮助设计真实评估。

0

相关内容

CASE

【2022新书】机器学习中的统计建模:概念和应用，398页pdf

【2022新书】机器学习中的统计建模:概念和应用，398页pdf

专知会员服务

142+阅读 · 2022年11月5日

不可错过！华盛顿大学最新《可解释人工智能》课程，系统讲述XAI最新进展

不可错过！华盛顿大学最新《可解释人工智能》课程，系统讲述XAI最新进展

专知会员服务

70+阅读 · 2022年9月14日

【PKDD2020教程】可解释人工智能XAI:算法到应用，200页ppt

专知会员服务

41+阅读 · 2020年10月13日

【剑桥大学】统计因果关系的决策理论基础，Decision-theoretic foundations for statistical causality

【剑桥大学】统计因果关系的决策理论基础，Decision-theoretic foundations for statistical causality

专知会员服务

48+阅读 · 2020年5月5日

知识图嵌入和可解释人工智能 Knowledge Graph Embeddings and Explainable AI

知识图嵌入和可解释人工智能 Knowledge Graph Embeddings and Explainable AI

专知会员服务

136+阅读 · 2020年5月1日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【SIGMOD2020】知识图谱补全方法的现实再评价，Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study

【SIGMOD2020】知识图谱补全方法的现实再评价，Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study

专知会员服务

33+阅读 · 2020年3月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【2022新书】机器学习中的统计建模:概念和应用，398页pdf

【2022新书】机器学习中的统计建模:概念和应用，398页pdf

专知

46+阅读 · 2022年11月5日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【Awesome】最全的机器学习可解释性资料（machine-learning-interpretability）

【Awesome】最全的机器学习可解释性资料（machine-learning-interpretability）

专知

29+阅读 · 2019年3月1日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

机器学习研究会

13+阅读 · 2017年12月25日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

高维积分波动率矩阵的估计及其在资产投资中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

基于质量管理的不确定性双向感性工学

国家自然科学基金

0+阅读 · 2014年12月31日

概率抽样设计及其统计推断方法

国家自然科学基金

6+阅读 · 2014年12月31日

高维数据下多因变量回归模型的统计推断

国家自然科学基金

5+阅读 · 2013年12月31日

Kindlin-2在上皮性卵巢癌中调节雌激素受体α表达的作用及其机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

不完全数据下广义半参数可加模型的统计推断

国家自然科学基金

0+阅读 · 2013年12月31日

随机控制系统的输入到状态稳定性

国家自然科学基金

0+阅读 · 2013年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

整数值时间序列数据的建模方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于抽象解释的逻辑程序验证研究

国家自然科学基金

1+阅读 · 2008年12月31日

Active Retrieval Augmented Generation

Arxiv

0+阅读 · 2023年5月11日

Automatic Evaluation of Attribution by Large Language Models

Arxiv

0+阅读 · 2023年5月10日

An Empirical Study on How the Developers Discussed about Pandas Topics

Arxiv

0+阅读 · 2023年5月10日

Statistical Plasmode Simulations -- Potentials, Challenges and Recommendations

Arxiv

0+阅读 · 2023年5月10日

IndoorSim-to-OutdoorReal: Learning to Navigate Outdoors without any Outdoor Experience

Arxiv

0+阅读 · 2023年5月10日

An Evaluation Framework for Personalization Strategy Experiment Designs

Arxiv

0+阅读 · 2023年5月9日

A Survey on Graph Counterfactual Explanations: Definitions, Methods, Evaluation

Arxiv

12+阅读 · 2022年10月21日

A Survey of Deep Causal Model

Arxiv

45+阅读 · 2022年9月19日

ExSum: From Local Explanations to Model Understanding

Arxiv

13+阅读 · 2022年4月30日

Beyond Accuracy: Behavioral Testing of NLP models with CheckList

Arxiv

11+阅读 · 2020年5月8日

VIP会员

文章信息

相关主题

相关VIP内容

【2022新书】机器学习中的统计建模:概念和应用，398页pdf

【2022新书】机器学习中的统计建模:概念和应用，398页pdf

专知会员服务

142+阅读 · 2022年11月5日

不可错过！华盛顿大学最新《可解释人工智能》课程，系统讲述XAI最新进展

不可错过！华盛顿大学最新《可解释人工智能》课程，系统讲述XAI最新进展

专知会员服务

70+阅读 · 2022年9月14日

【PKDD2020教程】可解释人工智能XAI:算法到应用，200页ppt

专知会员服务

41+阅读 · 2020年10月13日

【剑桥大学】统计因果关系的决策理论基础，Decision-theoretic foundations for statistical causality

【剑桥大学】统计因果关系的决策理论基础，Decision-theoretic foundations for statistical causality

专知会员服务

48+阅读 · 2020年5月5日

知识图嵌入和可解释人工智能 Knowledge Graph Embeddings and Explainable AI

知识图嵌入和可解释人工智能 Knowledge Graph Embeddings and Explainable AI

专知会员服务

136+阅读 · 2020年5月1日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【SIGMOD2020】知识图谱补全方法的现实再评价，Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study

【SIGMOD2020】知识图谱补全方法的现实再评价，Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study

专知会员服务

33+阅读 · 2020年3月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《英国国防部：辐射检测与监测设备》

《反无人机蜂群技术发展现状及新型防御系统分析》

人工智能在军事战略中的应用：指挥管理的新方法

《强大人工智能世界中维护安全：未来国防架构的考量》

相关资讯

【2022新书】机器学习中的统计建模:概念和应用，398页pdf

【2022新书】机器学习中的统计建模:概念和应用，398页pdf

专知

46+阅读 · 2022年11月5日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【Awesome】最全的机器学习可解释性资料（machine-learning-interpretability）

【Awesome】最全的机器学习可解释性资料（machine-learning-interpretability）

专知

29+阅读 · 2019年3月1日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

机器学习研究会

13+阅读 · 2017年12月25日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

相关论文

Active Retrieval Augmented Generation

Arxiv

0+阅读 · 2023年5月11日

Automatic Evaluation of Attribution by Large Language Models

Arxiv

0+阅读 · 2023年5月10日

An Empirical Study on How the Developers Discussed about Pandas Topics

Arxiv

0+阅读 · 2023年5月10日

Statistical Plasmode Simulations -- Potentials, Challenges and Recommendations

Arxiv

0+阅读 · 2023年5月10日

IndoorSim-to-OutdoorReal: Learning to Navigate Outdoors without any Outdoor Experience

Arxiv

0+阅读 · 2023年5月10日

An Evaluation Framework for Personalization Strategy Experiment Designs

Arxiv

0+阅读 · 2023年5月9日

A Survey on Graph Counterfactual Explanations: Definitions, Methods, Evaluation

Arxiv

12+阅读 · 2022年10月21日

A Survey of Deep Causal Model

Arxiv

45+阅读 · 2022年9月19日

ExSum: From Local Explanations to Model Understanding

Arxiv

13+阅读 · 2022年4月30日

Beyond Accuracy: Behavioral Testing of NLP models with CheckList

Arxiv

11+阅读 · 2020年5月8日

相关基金

高维积分波动率矩阵的估计及其在资产投资中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

基于质量管理的不确定性双向感性工学

国家自然科学基金

0+阅读 · 2014年12月31日

概率抽样设计及其统计推断方法

国家自然科学基金

6+阅读 · 2014年12月31日

高维数据下多因变量回归模型的统计推断

国家自然科学基金

5+阅读 · 2013年12月31日

Kindlin-2在上皮性卵巢癌中调节雌激素受体α表达的作用及其机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

不完全数据下广义半参数可加模型的统计推断

国家自然科学基金

0+阅读 · 2013年12月31日

随机控制系统的输入到状态稳定性

国家自然科学基金

0+阅读 · 2013年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

整数值时间序列数据的建模方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于抽象解释的逻辑程序验证研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员