Artificial intelligence (AI) systems have become increasingly common and the trend will continue. Examples of AI systems include autonomous vehicles (AV), computer vision, natural language processing, and AI medical experts. To allow for safe and effective deployment of AI systems, the reliability of such systems needs to be assessed. Traditionally, reliability assessment is based on reliability test data and the subsequent statistical modeling and analysis. The availability of reliability data for AI systems, however, is limited because such data are typically sensitive and proprietary. The California Department of Motor Vehicles (DMV) oversees and regulates an AV testing program, in which many AV manufacturers are conducting AV road tests. Manufacturers participating in the program are required to report recurrent disengagement events to California DMV. This information is being made available to the public. In this paper, we use recurrent disengagement events as a representation of the reliability of the AI system in AV, and propose a statistical framework for modeling and analyzing the recurrent events data from AV driving tests. We use traditional parametric models in software reliability and propose a new nonparametric model based on monotonic splines to describe the event process. We develop inference procedures for selecting the best models, quantifying uncertainty, and testing heterogeneity in the event process. We then analyze the recurrent events data from four AV manufacturers, and make inferences on the reliability of the AI systems in AV. We also describe how the proposed analysis can be applied to assess the reliability of other AI systems.
翻译:人工智能(AI)系统越来越普遍,趋势也越来越明显。独立智能系统的例子包括自主车辆(AV)、计算机视觉、自然语言处理和AI医疗专家。为了能够安全有效地部署AI系统,需要评估这类系统的可靠性。传统上,可靠性评估以可靠性测试数据及随后的统计模型和分析为基础。但独立智能系统的可靠性数据有限,因为这种数据通常敏感和专有。加利福尼亚机动车辆部(DMV)监督和管理AV测试程序,许多AV制造商正在进行AV道路测试。参与该方案的制造商需要向加利福尼亚DMV报告经常发生的脱离接触事件。这一信息正在向公众提供。在本文中,我们使用经常性的脱离事件作为AV系统可靠性的表示,并提议一个统计框架,用于模拟和分析AV驱动测试的经常性事件数据。我们在软件可靠性中使用传统的参数模型,并提议一个新的非参数性非参数模型,以描述事件过程的可靠性。我们还在选择经常性事件模型时,从AVIV测试中分析最佳的可靠性。