The adoption of deep neural networks (DNNs) in safety-critical contexts is often prevented by the lack of effective means to explain their results, especially when they are erroneous. In our previous work, we proposed a white-box approach (HUDD) and a black-box approach (SAFE) to automatically characterize DNN failures. They both identify clusters of similar images from a potentially large set of images leading to DNN failures. However, the analysis pipelines for HUDD and SAFE were instantiated in specific ways according to common practices, deferring the analysis of other pipelines to future work. In this paper, we report on an empirical evaluation of 99 different pipelines for root cause analysis of DNN failures. They combine transfer learning, autoencoders, heatmaps of neuron relevance, dimensionality reduction techniques, and different clustering algorithms. Our results show that the best pipeline combines transfer learning, DBSCAN, and UMAP. It leads to clusters almost exclusively capturing images of the same failure scenario, thus facilitating root cause analysis. Further, it generates distinct clusters for each root cause of failure, thus enabling engineers to detect all the unsafe scenarios. Interestingly, these results hold even for failure scenarios that are only observed in a small percentage of the failing images.
翻译:在安全危急情况下,缺乏解释其结果的有效手段,特别是在错误的情况下,往往无法在安全危急情况下采用深心神经网络(DNN),这往往因为缺乏解释其结果的有效手段而受阻。在以前的工作中,我们建议采用白箱方法和黑箱方法(HUDD)自动确定DN失败的特征。这两种方法都从可能导致DNN失败的一大批图像中找出了类似图像的组群。然而,HUDD和SAFE的分析管道根据共同做法以具体方式即刻进行,将对其他管道的分析推迟到今后的工作。在本文中,我们报告对99条不同管道进行的经验评价,用于DNNN失败的根源分析。它们结合了神经相关性的传输学习、自动编码器、热图、维度减少技术和不同的组合算法。我们的结果显示,最好的管道将传输学习、DBSCAN和UMAPA结合起来。它导致几乎完全捕捉到同一失败情景的图像,从而便利对根源进行分析。此外,它为每一小失败的根源产生不同的组,从而使工程师能够探测所有不安全的图像的失败百分比。