DL frameworks are the basis of constructing all DL programs and models, and thus their bugs could lead to the unexpected behaviors of any DL program or model relying on them. Such a wide effect demonstrates the necessity and importance of guaranteeing DL frameworks' quality. Understanding the characteristics of DL framework bugs is a fundamental step for this quality assurance task, facilitating designing effective bug detection and debugging approaches. Hence, in this work we conduct the most large-scale study on 1,000 bugs from four popular and diverse DL frameworks (i.e., TensorFlow, PyTorch, MXNet, and DL4J). By analyzing the root causes and symptoms of DL framework bugs associated with 5 components decomposed from DL frameworks, as well as measuring test coverage achieved by three state-of-the-art testing techniques, we obtain 12 major findings for the comprehensive understanding of DL framework bugs and the current status of existing DL framework testing practice, and then provide a series of actionable guidelines for better DL framework bug detection and debugging. Finally, based on the guidelines, we design and implement a prototype DL-framework testing tool, called TenFuzz, which is evaluated to be effective and finds 3 unknown bugs on the latest TensorFlow framework in a preliminary study, indicating the significance of our guidelines.
翻译:DL 框架是构建所有 DL 程序和模型的基础, 因此它们的错误可能导致任何 DL 程序或模型的意外行为。 如此广泛的效果表明保障 DL 框架质量的必要性和重要性。 了解 DL 框架错误的特性是质量保证任务的基本步骤, 有助于设计有效的错误检测和调试方法。 因此, 我们在此工作中对来自四个广受欢迎的和多样化 DL 框架( 即 TensorFlow、 PyTorrch、 MXNet 和 DL4J) 的1,000个错误进行了最大规模的研究, 并提供了一系列可操作的指导方针, 以便更好地检测 DL 框架错误和调试。 最后, 根据指导方针, 我们设计和实施一个未知的DLF 框架错误和 测试框架的原型, 也就是一个未知的DLF 测试模型, 并用一个未知的模型来评估。</s>