DL frameworks are the basis of constructing all DL programs and models, and thus their bugs could lead to the unexpected behaviors of any DL program or model relying on them. Such wide effect demonstrates the necessity and importance of guaranteeing DL frameworks' quality. Understanding the characteristics of DL framework bugs is a fundamental step for this quality assurance task, facilitating to design effective bug detection and debugging approaches. Hence, in this work we conduct the most large-scale study on 800 bugs from four popular and diverse DL frameworks (i.e., TensorFlow, PyTorch, MXNet, and DL4J). By analyzing the root causes and symptoms of DL framework bugs associated with 5 components decomposed from DL frameworks, as well as measuring test coverage achieved by three state-of-the-art testing techniques and developers' efforts on fixing those bugs, we obtain 14 major findings for the comprehensive understanding of DL framework bugs and the current status of existing DL framework testing and debugging practice, and then provide a series of actionable guidelines for better DL framework bug detection and debugging.
翻译:DL 框架是构建所有 DL 程序和模型的基础,因此它们的错误可能导致任何 DL 程序或模型的意外行为。这种广泛效果表明保障 DL 框架质量的必要性和重要性。了解 DL 框架错误的特性是质量保证任务的基本步骤,有助于设计有效的错误检测和调试方法。因此,我们在此工作中对四个广受欢迎的和多样化 DL 框架(如TensorFlow、PyTorrch、MXNet和DL4J)的800个错误进行了最大规模的研究。通过分析与从 DL 框架拆解的5个组成部分相关的DL 框架错误的根源和症状,以及测量三个最先进的测试技术和开发者为纠正这些错误所作的努力所实现的测试范围,我们获得了14项主要研究结果,以全面了解 DL 框架错误和现有 DL 框架测试和调试做法的现状,然后提供一系列可操作的指导方针,以改进 DL 框架错误检测和调试。