机器学习中图像分类错误检测评估实践反思 (A Call to Reflect on Evaluation Practices for Failure Detection in Image Classification) - 专知论文

会员服务 ·

0

错误检测 · 评分函数 · 置信度 · 图像分类 · 不一致性 ·

2023 年 4 月 5 日

A Call to Reflect on Evaluation Practices for Failure Detection in Image Classification

翻译：机器学习中图像分类错误检测评估实践反思

Paul F. Jaeger,Carsten T. Lüth,Lukas Klein,Till J. Bungert

Reliable application of machine learning-based decision systems in the wild is one of the major challenges currently investigated by the field. A large portion of established approaches aims to detect erroneous predictions by means of assigning confidence scores. This confidence may be obtained by either quantifying the model's predictive uncertainty, learning explicit scoring functions, or assessing whether the input is in line with the training distribution. Curiously, while these approaches all state to address the same eventual goal of detecting failures of a classifier upon real-life application, they currently constitute largely separated research fields with individual evaluation protocols, which either exclude a substantial part of relevant methods or ignore large parts of relevant failure sources. In this work, we systematically reveal current pitfalls caused by these inconsistencies and derive requirements for a holistic and realistic evaluation of failure detection. To demonstrate the relevance of this unified perspective, we present a large-scale empirical study for the first time enabling benchmarking confidence scoring functions w.r.t all relevant methods and failure sources. The revelation of a simple softmax response baseline as the overall best performing method underlines the drastic shortcomings of current evaluation in the abundance of publicized research on confidence scoring. Code and trained models are at https://github.com/IML-DKFZ/fd-shifts.

翻译：机器学习决策系统在实际应用中的可靠性是该领域目前研究的主要挑战之一。大部分已建立的方法旨在通过分配置信度来检测错误预测。这种信心可能是通过量化模型的预测不确定性、学习显式评分函数或评估输入是否符合训练分布等手段获得的。令人惊讶的是，虽然这些方法都宣称致力于在实际应用中检测分类器失败，但它们目前构成了大部分互相独立的研究领域，具有单独的评估协议，其中有些排除了相当一部分相关方法，有些则忽略了大量相关错误来源的数据。在这项工作中，我们系统地揭示了这些不一致性可能带来的当前问题，并得出了对于整体和实际评估失败检测的要求。为证明这种统一视角的重要性，我们首次进行了大规模实证研究，使评分函数针对所有相关方法和故障源进行基准测试。普通的softmax响应基线的揭示强调了当前置信度评估中的严重缺陷。代码和训练模型位于https://github.com/IML-DKFZ/fd-shifts。

0

相关内容

错误检测

【CVPR2020-Uber】物理上可实现的对抗性的例子，用于激光雷达的目标检测，Physically Realizable Adversarial Examples for LiDAR Object Detection

【CVPR2020-Uber】物理上可实现的对抗性的例子，用于激光雷达的目标检测，Physically Realizable Adversarial Examples for LiDAR Object Detection

专知会员服务

22+阅读 · 2020年4月16日

【ICML2020投稿论文】用于半监督图像分类的CowMask，Milking CowMask for Semi-Supervised Image Classification

【ICML2020投稿论文】用于半监督图像分类的CowMask，Milking CowMask for Semi-Supervised Image Classification

专知会员服务

29+阅读 · 2020年3月27日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【阿里巴巴-达摩院】深度学习的时间序列数据增强综述，Time Series Data Augmentation for Deep Learning: A Survey

【阿里巴巴-达摩院】深度学习的时间序列数据增强综述，Time Series Data Augmentation for Deep Learning: A Survey

专知会员服务

134+阅读 · 2020年3月2日

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

专知会员服务

82+阅读 · 2020年2月27日

【Nature论文】用于理解图像分类决策和改进神经网络鲁棒性的对抗性解释（Adversarial Explanations for Understanding Image Classiﬁcation Decisions and Improved Neural Network Robustness ）

【Nature论文】用于理解图像分类决策和改进神经网络鲁棒性的对抗性解释（Adversarial Explanations for Understanding Image Classiﬁcation Decisions and Improved Neural Network Robustness ）

专知会员服务

13+阅读 · 2019年11月25日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新十篇目标跟踪相关论文—多帧光流跟踪、动态图学习、MV-YOLO、姿态估计、深度核相关滤波、Benchmark

【论文推荐】最新十篇目标跟踪相关论文—多帧光流跟踪、动态图学习、MV-YOLO、姿态估计、深度核相关滤波、Benchmark

专知

13+阅读 · 2018年5月26日

LibRec 精选：推荐的可解释性[综述]

LibRec 精选：推荐的可解释性[综述]

LibRec智能推荐

10+阅读 · 2018年5月4日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

内源性逆转录病毒在小鼠胚胎干细胞中的转录抑制机制

国家自然科学基金

0+阅读 · 2016年12月31日

高血压血管重塑中血管保护分子CREG基因表达的上游调控机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

HuR/lincRNA152复合物在脓毒症免疫抑制中的调控机制

国家自然科学基金

0+阅读 · 2015年12月31日

MiR-212/203调控糖网病神经节细胞凋亡的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

MeCP2在增龄性EPCs功能障碍中的作用及机制

国家自然科学基金

0+阅读 · 2014年12月31日

lncRNAs和miR-592的相互作用对mESC向神经元分化的影响

国家自然科学基金

0+阅读 · 2012年12月31日

原始态(naive)和始发态(primed)水牛诱导多能干细胞的研究

国家自然科学基金

0+阅读 · 2012年12月31日

联合携PGE2基因骨髓间充质干细胞移植重塑Kuppfer细胞表型诱导大鼠肝移植术后免疫耐受

国家自然科学基金

0+阅读 · 2011年12月31日

SOCS-1对糖尿病肾病的影响及机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

抑制Kupffer细胞HDAC11活性诱导大鼠肝移植免疫耐受

国家自然科学基金

0+阅读 · 2009年12月31日

WYWEB: A NLP Evaluation Benchmark For Classical Chinese

Arxiv

0+阅读 · 2023年5月23日

Multi-Agent Active Search using Detection and Location Uncertainty

Arxiv

0+阅读 · 2023年5月22日

Optimal Transport for Unsupervised Hallucination Detection in Neural Machine Translation

Arxiv

0+阅读 · 2023年5月19日

Transformers in Medical Imaging: A Survey

Arxiv

15+阅读 · 2022年1月24日

Generalized Out-of-Distribution Detection: A Survey

Generalized Out-of-Distribution Detection: A Survey

Arxiv

15+阅读 · 2021年10月21日

From Show to Tell: A Survey on Image Captioning

Arxiv

15+阅读 · 2021年7月14日

Imitation Learning: Progress, Taxonomies and Opportunities

Arxiv

12+阅读 · 2021年6月23日

nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation

Arxiv

12+阅读 · 2018年9月27日

Automatically Designing CNN Architectures for Medical Image Segmentation

Automatically Designing CNN Architectures for Medical Image Segmentation

Arxiv

10+阅读 · 2018年7月19日

Re-ID done right: towards good practices for person re-identification

Arxiv

14+阅读 · 2018年1月16日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR2020-Uber】物理上可实现的对抗性的例子，用于激光雷达的目标检测，Physically Realizable Adversarial Examples for LiDAR Object Detection

【CVPR2020-Uber】物理上可实现的对抗性的例子，用于激光雷达的目标检测，Physically Realizable Adversarial Examples for LiDAR Object Detection

专知会员服务

22+阅读 · 2020年4月16日

【ICML2020投稿论文】用于半监督图像分类的CowMask，Milking CowMask for Semi-Supervised Image Classification

【ICML2020投稿论文】用于半监督图像分类的CowMask，Milking CowMask for Semi-Supervised Image Classification

专知会员服务

29+阅读 · 2020年3月27日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【阿里巴巴-达摩院】深度学习的时间序列数据增强综述，Time Series Data Augmentation for Deep Learning: A Survey

【阿里巴巴-达摩院】深度学习的时间序列数据增强综述，Time Series Data Augmentation for Deep Learning: A Survey

专知会员服务

134+阅读 · 2020年3月2日

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

专知会员服务

82+阅读 · 2020年2月27日

【Nature论文】用于理解图像分类决策和改进神经网络鲁棒性的对抗性解释（Adversarial Explanations for Understanding Image Classiﬁcation Decisions and Improved Neural Network Robustness ）

【Nature论文】用于理解图像分类决策和改进神经网络鲁棒性的对抗性解释（Adversarial Explanations for Understanding Image Classiﬁcation Decisions and Improved Neural Network Robustness ）

专知会员服务

13+阅读 · 2019年11月25日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能治理的未来

模态感知的特征匹配：单一模态与跨模态技术的全面综述

无监督行人重识别研究综述

【牛津博士论文】面向神经影像应用的可扩展且可解释的空间模型

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新十篇目标跟踪相关论文—多帧光流跟踪、动态图学习、MV-YOLO、姿态估计、深度核相关滤波、Benchmark

【论文推荐】最新十篇目标跟踪相关论文—多帧光流跟踪、动态图学习、MV-YOLO、姿态估计、深度核相关滤波、Benchmark

专知

13+阅读 · 2018年5月26日

LibRec 精选：推荐的可解释性[综述]

LibRec 精选：推荐的可解释性[综述]

LibRec智能推荐

10+阅读 · 2018年5月4日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

相关论文

WYWEB: A NLP Evaluation Benchmark For Classical Chinese

Arxiv

0+阅读 · 2023年5月23日

Multi-Agent Active Search using Detection and Location Uncertainty

Arxiv

0+阅读 · 2023年5月22日

Optimal Transport for Unsupervised Hallucination Detection in Neural Machine Translation

Arxiv

0+阅读 · 2023年5月19日

Transformers in Medical Imaging: A Survey

Arxiv

15+阅读 · 2022年1月24日

Generalized Out-of-Distribution Detection: A Survey

Generalized Out-of-Distribution Detection: A Survey

Arxiv

15+阅读 · 2021年10月21日

From Show to Tell: A Survey on Image Captioning

Arxiv

15+阅读 · 2021年7月14日

Imitation Learning: Progress, Taxonomies and Opportunities

Arxiv

12+阅读 · 2021年6月23日

nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation

Arxiv

12+阅读 · 2018年9月27日

Automatically Designing CNN Architectures for Medical Image Segmentation

Automatically Designing CNN Architectures for Medical Image Segmentation

Arxiv

10+阅读 · 2018年7月19日

Re-ID done right: towards good practices for person re-identification

Arxiv

14+阅读 · 2018年1月16日

相关基金

内源性逆转录病毒在小鼠胚胎干细胞中的转录抑制机制

国家自然科学基金

0+阅读 · 2016年12月31日

高血压血管重塑中血管保护分子CREG基因表达的上游调控机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

HuR/lincRNA152复合物在脓毒症免疫抑制中的调控机制

国家自然科学基金

0+阅读 · 2015年12月31日

MiR-212/203调控糖网病神经节细胞凋亡的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

MeCP2在增龄性EPCs功能障碍中的作用及机制

国家自然科学基金

0+阅读 · 2014年12月31日

lncRNAs和miR-592的相互作用对mESC向神经元分化的影响

国家自然科学基金

0+阅读 · 2012年12月31日

原始态(naive)和始发态(primed)水牛诱导多能干细胞的研究

国家自然科学基金

0+阅读 · 2012年12月31日

联合携PGE2基因骨髓间充质干细胞移植重塑Kuppfer细胞表型诱导大鼠肝移植术后免疫耐受

国家自然科学基金

0+阅读 · 2011年12月31日

SOCS-1对糖尿病肾病的影响及机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

抑制Kupffer细胞HDAC11活性诱导大鼠肝移植免疫耐受

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员