评价机器学习做法方面的差距 (Evaluation Gaps in Machine Learning Practice)

Forming a reliable judgement of a machine learning (ML) model's appropriateness for an application ecosystem is critical for its responsible use, and requires considering a broad range of factors including harms, benefits, and responsibilities. In practice, however, evaluations of ML models frequently focus on only a narrow range of decontextualized predictive behaviours. We examine the evaluation gaps between the idealized breadth of evaluation concerns and the observed narrow focus of actual evaluations. Through an empirical study of papers from recent high-profile conferences in the Computer Vision and Natural Language Processing communities, we demonstrate a general focus on a handful of evaluation methods. By considering the metrics and test data distributions used in these methods, we draw attention to which properties of models are centered in the field, revealing the properties that are frequently neglected or sidelined during evaluation. By studying these properties, we demonstrate the machine learning discipline's implicit assumption of a range of commitments which have normative impacts; these include commitments to consequentialism, abstractability from context, the quantifiability of impacts, the limited role of model inputs in evaluation, and the equivalence of different failure modes. Shedding light on these assumptions enables us to question their appropriateness for ML system contexts, pointing the way towards more contextualized evaluation methodologies for robustly examining the trustworthiness of ML models

翻译：对机器学习模型(ML)对应用生态系统的适当性作出可靠的判断,对于负责任地使用该模型至关重要,需要考虑一系列广泛的因素,包括伤害、利益和责任。然而,在实践中,对ML模型的评价往往只侧重于范围狭窄的非通俗预测行为。我们研究了理想化的评价关切范围与观察到的实际评价的狭隘重点之间的评价差距。我们通过对计算机视野和自然语言处理社区最近举行的高知名度会议的文件进行经验性研究,展示了对少数评价方法的普遍关注。通过考虑这些方法中使用的计量和测试数据分配方法,我们提请注意模型的哪些特性以实地为中心,揭示评价期间经常被忽视或忽略的特性。我们通过研究这些特性,表明机器学习学科对一系列具有规范影响的承诺的隐含假设;这些包括对后果、抽象性、影响的可计量性、模型投入在评价中的有限作用和不同失败模式的等等值。我们从这些假设的角度来看,能够发现这些模型的特性,揭示在评价期间经常被忽视或被忽略的特性。我们通过研究这些特性,表明机器学习的学科对一系列承诺的隐含的假定具有规范性影响;这些承诺包括承诺、从背景的可抽象性、影响、影响、影响的可量化、评价的有限性、模型在评估中作用上,我们能够审查其背景评估的正确性的方式。

相关内容

Machine Learning

关注 2245

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日