Deep neural networks (DNNs) are widely used in various application domains such as image processing, speech recognition, and natural language processing. However, testing DNN models may be challenging due to the complexity and size of their input domain. Particularly, testing DNN models often requires generating or exploring large unlabeled datasets. In practice, DNN test oracles, which identify the correct outputs for inputs, often require expensive manual effort to label test data, possibly involving multiple experts to ensure labeling correctness. In this paper, we propose DeepGD, a black-box multi-objective test selection approach for DNN models. It reduces the cost of labeling by prioritizing the selection of test inputs with high fault revealing power from large unlabeled datasets. DeepGD not only selects test inputs with high uncertainty scores to trigger as many mispredicted inputs as possible but also maximizes the probability of revealing distinct faults in the DNN model by selecting diverse mispredicted inputs. The experimental results conducted on four widely used datasets and five DNN models show that in terms of fault-revealing ability: (1) White-box, coverage-based approaches fare poorly, (2) DeepGD outperforms existing black-box test selection approaches in terms of fault detection, and (3) DeepGD also leads to better guidance for DNN model retraining when using selected inputs to augment the training set.
翻译:深神经网络(DNN)被广泛用于图像处理、语音识别和自然语言处理等各种应用领域。然而,测试 DNN模型可能因其输入域的复杂性和大小而具有挑战性。特别是,测试 DNN模型往往需要生成或探索大型无标签数据集。在实践中,确定输入正确产出的DNN测试器通常需要花费昂贵的手工努力来标签测试数据,可能需要多位专家参与,以确保标签正确性。在本文件中,我们提议了DNN模型的黑箱黑箱多目标测试选择方法,即DNN模型的黑箱多目标选择方法。通过优先选择从大型未贴标签数据集显示能量的测试输入来降低标签的成本。DNNN模型不仅选择高不确定性评分的测试输入,以便尽可能触发许多错误输入,而且还通过选择不同错误的输入来最大限度地暴露DNNN模型中明显缺陷的概率。在四套广泛使用的数据集和五套DNNN模型中进行的实验结果显示,从错误识别能力方面显示,在深度黑黑模型中显示:(1) 深箱中选择的测试模式,在深度测试中选择现有测试模式时还差式测试方法。</s>