Forming a reliable judgement of a machine learning (ML) model's appropriateness for an application ecosystem is critical for its responsible use, and requires considering a broad range of factors including harms, benefits, and responsibilities. In practice, however, evaluations of ML models frequently focus on only a narrow range of decontextualized predictive behaviours. We examine the evaluation gaps between the idealized breadth of evaluation concerns and the observed narrow focus of actual evaluations. Through an empirical study of papers from recent high-profile conferences in the Computer Vision and Natural Language Processing communities, we demonstrate a general focus on a handful of evaluation methods. By considering the metrics and test data distributions used in these methods, we draw attention to which properties of models are centered in the field, revealing the properties that are frequently neglected or sidelined during evaluation. By studying these properties, we demonstrate the machine learning discipline's implicit assumption of a range of commitments which have normative impacts; these include commitments to consequentialism, abstractability from context, the quantifiability of impacts, the limited role of model inputs in evaluation, and the equivalence of different failure modes. Shedding light on these assumptions enables us to question their appropriateness for ML system contexts, pointing the way towards more contextualized evaluation methodologies for robustly examining the trustworthiness of ML models
翻译:对机器学习模型(ML)对应用生态系统的适当性作出可靠的判断,对于负责任地使用该模型至关重要,需要考虑一系列广泛的因素,包括伤害、利益和责任。然而,在实践中,对ML模型的评价往往只侧重于范围狭窄的非通俗预测行为。我们研究了理想化的评价关切范围与观察到的实际评价的狭隘重点之间的评价差距。我们通过对计算机视野和自然语言处理社区最近举行的高知名度会议的文件进行经验性研究,展示了对少数评价方法的普遍关注。通过考虑这些方法中使用的计量和测试数据分配方法,我们提请注意模型的哪些特性以实地为中心,揭示评价期间经常被忽视或忽略的特性。我们通过研究这些特性,表明机器学习学科对一系列具有规范影响的承诺的隐含假设;这些包括对后果、抽象性、影响的可计量性、模型投入在评价中的有限作用和不同失败模式的等等值。我们从这些假设的角度来看,能够发现这些模型的特性,揭示在评价期间经常被忽视或被忽略的特性。我们通过研究这些特性,表明机器学习的学科对一系列承诺的隐含的假定具有规范性影响;这些承诺包括承诺、从背景的可抽象性、影响、影响、影响的可量化、评价的有限性、模型在评估中作用上,我们能够审查其背景评估的正确性的方式。