深X:深学习系统自动白箱测试 (DeepXplore: Automated Whitebox Testing of Deep Learning Systems)

Deep learning (DL) systems are increasingly deployed in safety- and security-critical domains including self-driving cars and malware detection, where the correctness and predictability of a system's behavior for corner case inputs are of great importance. Existing DL testing depends heavily on manually labeled data and therefore often fails to expose erroneous behaviors for rare inputs. We design, implement, and evaluate DeepXplore, the first whitebox framework for systematically testing real-world DL systems. First, we introduce neuron coverage for systematically measuring the parts of a DL system exercised by test inputs. Next, we leverage multiple DL systems with similar functionality as cross-referencing oracles to avoid manual checking. Finally, we demonstrate how finding inputs for DL systems that both trigger many differential behaviors and achieve high neuron coverage can be represented as a joint optimization problem and solved efficiently using gradient-based search techniques. DeepXplore efficiently finds thousands of incorrect corner case behaviors (e.g., self-driving cars crashing into guard rails and malware masquerading as benign software) in state-of-the-art DL models with thousands of neurons trained on five popular datasets including ImageNet and Udacity self-driving challenge data. For all tested DL models, on average, DeepXplore generated one test input demonstrating incorrect behavior within one second while running only on a commodity laptop. We further show that the test inputs generated by DeepXplore can also be used to retrain the corresponding DL model to improve the model's accuracy by up to 3%.

翻译：深度学习( DL) 系统越来越多地在安全和安保关键领域部署, 包括自我驾驶汽车和恶意软件检测, 其中系统对角落案件输入的行为的准确性和可预测性非常重要。现有的 DL 测试严重依赖手工标签数据, 因而往往无法暴露对稀有输入的错误行为。我们设计、实施和评价DeepXplore, 这是用于系统测试真实世界 DL系统的第一个白箱框架。首先, 我们引入神经覆盖系统系统系统系统, 系统测量由测试输入运行的 DL 系统的部分。其次, 我们利用多个DL 系统, 其功能类似于交叉参照或触角, 以避免手动检查。最后, 我们演示如何为 DL 系统查找既触发许多不同行为又实现高神经覆盖的错误行为, 作为一种联合优化问题, 并使用基于渐变的搜索技术高效率地解决。深Xplore 高效地发现数千个错误的角落行为( e. g. 自行驾驶汽车坠落到由测试输入输入的3L masquerad marad marad ) 系统系统系统进行系统测试。我们只能在状态的DL Explain Net- Explain modeal deal modeal devideal modeal modeal modeal deal decudeal decudeal decudeal decudeal decudeal decudeals) 中, ex acudeal disal decumental decumental decumental decumental decumental decumental decumental decumental decuments 。