A growing body of research has been dedicated to DL model testing. However, there is still limited work on testing DL libraries, which serve as the foundations for building, training, and running DL models. Prior work on fuzzing DL libraries can only generate tests for APIs which have been invoked by documentation examples, developer tests, or DL models, leaving a large number of APIs untested. In this paper, we propose DeepREL, the first approach to automatically inferring relational APIs for more effective DL library fuzzing. Our basic hypothesis is that for a DL library under test, there may exist a number of APIs sharing similar input parameters and outputs; in this way, we can easily "borrow" test inputs from invoked APIs to test other relational APIs. Furthermore, we formalize the notion of value equivalence and status equivalence for relational APIs to serve as the oracle for effective bug finding. We have implemented DeepREL as a fully automated end-to-end relational API inference and fuzzing technique for DL libraries, which 1) automatically infers potential API relations based on API syntactic or semantic information, 2) synthesizes concrete test programs for invoking relational APIs, 3) validates the inferred relational APIs via representative test inputs, and finally 4) performs fuzzing on the verified relational APIs to find potential inconsistencies. Our evaluation on two of the most popular DL libraries, PyTorch and TensorFlow, demonstrates that DeepREL can cover 157% more APIs than state-of-the-art FreeFuzz. To date, DeepREL has detected 162 bugs in total, with 106 already confirmed by the developers as previously unknown bugs. Surprisingly, DeepREL has detected 13.5% of the high-priority bugs for the entire PyTorch issue-tracking system in a three-month period. Also, besides the 162 code bugs, we have also detected 14 documentation bugs (all confirmed).
翻译:用于 DL 模型测试的研究越来越多。 然而, 测试 DL 库的工作仍然有限, 用作构建、 培训和运行 DL 模型的基础。 之前的 DL 库工作只能为 API 生成测试, 由文件示例、 开发者测试或 DL 模型引用, 使得大量 API 的值等值和状态等值没有测试。 在本文中, 我们提议 DeepREL, 自动推断关系 API 的第一个方法, 以便更有效地 DL 库的不一致性。 我们的基本假设是, 对于正在测试的 DL 库, 可能存在一些共享类似输入参数和输出的 DL 的 DL 库。 这样, 我们很容易“ 浏览” 测试 API 的测试输入, 使 API 的值等值和状态等值概念正式化, 成为有效查找错误的标志。 我们用 Deeprereal 14 来完全自动化的端端端点关系索引和拖动技术, 在 DL 库中, 自动显示 AL AL 的 AL 直线路端关系 关系 关系 。