关于人造情报系统可靠性的统计视角 (Statistical Perspectives on Reliability of Artificial Intelligence Systems)

Artificial intelligence (AI) systems have become increasingly popular in many areas. Nevertheless, AI technologies are still in their developing stages, and many issues need to be addressed. Among those, the reliability of AI systems needs to be demonstrated so that the AI systems can be used with confidence by the general public. In this paper, we provide statistical perspectives on the reliability of AI systems. Different from other considerations, the reliability of AI systems focuses on the time dimension. That is, the system can perform its designed functionality for the intended period. We introduce a so-called SMART statistical framework for AI reliability research, which includes five components: Structure of the system, Metrics of reliability, Analysis of failure causes, Reliability assessment, and Test planning. We review traditional methods in reliability data analysis and software reliability, and discuss how those existing methods can be transformed for reliability modeling and assessment of AI systems. We also describe recent developments in modeling and analysis of AI reliability and outline statistical research challenges in this area, including out-of-distribution detection, the effect of the training set, adversarial attacks, model accuracy, and uncertainty quantification, and discuss how those topics can be related to AI reliability, with illustrative examples. Finally, we discuss data collection and test planning for AI reliability assessment and how to improve system designs for higher AI reliability. The paper closes with some concluding remarks.

翻译：人工智能(AI)系统在许多领域越来越受欢迎,然而,人工智能(AI)系统在许多领域越来越受欢迎,但人工智能(AI)技术仍处于发展阶段,需要解决许多问题,其中,需要证明AI系统的可靠性,以便公众能够有信心地使用AI系统的可靠性。我们在本文件中提供了关于人工智能系统的可靠性的统计观点。不同于其他考虑,人工智能系统的可靠性侧重于时间层面。也就是说,该系统可以在预期的时期内发挥其设计功能。我们为AI可靠性研究引入一个所谓的SMART统计框架,其中包括五个组成部分:系统结构、可靠性计量、故障原因分析、可靠性评估、可靠性评估和测试规划。我们审查了可靠性数据分析和软件可靠性的传统方法,并讨论了如何将现有方法转化为AI系统的可靠性建模和评估。我们还描述了在对AI可靠性进行建模和分析方面的近期动态,并概述了该领域的统计研究挑战,包括分配之外的检测、培训集的效果、对抗性攻击、模型准确性和不确定性的量化,并讨论了这些专题如何与AI可靠性和可靠性的精确性评估联系起来。我们讨论了如何与AI的精确性评估进行密切的测试。