免费午餐用于测试:来自开放源码的模糊的深学习图书馆 (Free Lunch for Testing: Fuzzing Deep-Learning Libraries from Open Source)

Deep learning (DL) systems can make our life much easier, and thus is gaining more and more attention from both academia and industry. Meanwhile, bugs in DL systems can be disastrous, and can even threaten human lives in safety-critical applications. To date, a huge body of research efforts have been dedicated to testing DL models. However, interestingly, there is still limited work for testing the underlying DL libraries, which are the foundation for building, optimizing, and running DL models. One potential reason is that test generation for the underlying DL libraries can be rather challenging since their public APIs are mainly exposed in Python, making it even hard to automatically determine the API input parameter types due to dynamic typing. In this paper, we propose FreeFuzz, the first approach to fuzzing DL libraries via mining from open source. More specifically, FreeFuzz obtains code/models from three different sources: 1) code snippets from the library documentation, 2) library developer tests, and 3) DL models in the wild. Then, FreeFuzz automatically runs all the collected code/models with instrumentation to trace the dynamic information for each covered API, including the types and values of each parameter during invocation, and shapes of input/output tensors. Lastly, FreeFuzz will leverage the traced dynamic information to perform fuzz testing for each covered API. The extensive study of FreeFuzz on PyTorch and TensorFlow, two of the most popular DL libraries, shows that FreeFuzz is able to automatically trace valid dynamic information for fuzzing 1158 popular APIs, 9X more than state-of-the-art LEMON with 3.5X lower overhead than LEMON. To date, FreeFuzz has detected 49 bugs for PyTorch and TensorFlow (with 38 already confirmed by developers as previously unknown).

翻译：深层学习( DL) 系统可以让我们的生活更容易一些, 从而越来越受到学术界和工业界的更多关注。与此同时, DL 系统中的错误可能是灾难性的, 甚至会在安全关键应用程序中威胁到人的生命。到目前为止, 大量的研究努力已经致力于测试 DL 模型。然而, 有趣的是, 测试38 个基础 DL 库的工作仍然有限, 这是构建、优化和运行 DL 模型的基础。一个潜在的原因是, 基础 DL 图书馆的测试生成可能相当困难, 因为其公开的 API 主要是在 Python 中曝光, 使得自动确定 API 输入参数类型更加困难。在本文中, 我们建议 FreeFzz, 第一次从开放源中测试 DL 图书馆。更具体地说, FreeFlorder Plicker 从三个不同来源获得代码/ : 1) 库文档文档的代码缩略图, 2 图书馆开发者测试, 和3 DLPI 智能模型在野生中被确认, 然后Fzz 自动运行所有收集的代码/ 格式和模型, 最精确的代码/ 格式的, 将显示每个动态数据库的每个版本的 RDVI 。