Deep learning (DL) has attracted wide attention and has been widely deployed in recent years. As a result, more and more research efforts have been dedicated to testing DL libraries and frameworks. However, existing work largely overlooked one crucial component of any DL system, automatic differentiation (AD), which is the basis for the recent development of DL. To this end, we propose $\nabla$Fuzz, the first general and practical approach specifically targeting the critical AD component in DL libraries. Our key insight is that each DL library API can be abstracted into a function processing tensors/vectors, which can be differentially tested under various execution scenarios (for computing outputs/gradients with different implementations). We have implemented $\nabla$Fuzz as a fully automated API-level fuzzer targeting AD in DL libraries, which utilizes differential testing on different execution scenarios to test both first-order and high-order gradients, and also includes automated filtering strategies to remove false positives caused by numerical instability. We have performed an extensive study on four of the most popular and actively-maintained DL libraries, PyTorch, TensorFlow, JAX, and OneFlow. The result shows that $\nabla$Fuzz substantially outperforms state-of-the-art fuzzers in terms of both code coverage and bug detection. To date, $\nabla$Fuzz has detected 173 bugs for the studied DL libraries, with 144 already confirmed by developers (117 of which are previously unknown bugs and 107 are related to AD). Remarkably, $\nabla$Fuzz contributed 58.3% (7/12) of all high-priority AD bugs for PyTorch and JAX during a two-month period. None of the confirmed AD bugs were detected by existing fuzzers.
翻译:深度学习( DL) 吸引了广泛的关注, 近些年来已经广泛应用了深层次学习( DL) 。 结果, 越来越多的研究工作被投入到 DL 图书馆和框架的测试中。 但是, 现有的工作基本上忽略了任何 DL 系统的关键组成部分之一, 即自动区分( AD), 这是最近开发 DL 的基础。 为此, 我们提出$\ nabla$ Fuzz, 这是专门针对 DL 图书馆的关键 AD 组件的第一个一般和实用方法。 我们的关键见解是, 每个 DL 图书馆 API 可以被抽成一个函数处理 Exors/ Vctors, 可以在不同的执行方案下( 计算出未知的输出/ 梯度) 。 我们已经实施了 $\ nabla$l$l$l$l$lfu Fuzz 的完全自动化的 API- 级 fuzzergergergerger, 在不同执行方案中, 测试第一阶级和高阶梯梯级梯, 也通过自动过滤器来消除由数字不稳定造成的错误状。 我们确认了四个最受欢迎的DL 最受欢迎的和最常用的DL la 的DFDRY la la la la la la la la la la 的 Revormax la la la la la la la 。