Deep-learning (DL) compilers such as TVM and TensorRT are increasingly being used to optimize deep neural network (DNN) models to meet performance, resource utilization and other requirements. Bugs in these compilers can result in models whose semantics differ from the original ones, producing incorrect results that corrupt the correctness of downstream applications. However, finding bugs in these compilers is challenging due to their complexity. In this work, we propose a new fuzz testing approach for finding bugs in deep-learning compilers. Our core approach consists of (i) generating diverse yet valid DNN test models that can exercise a large part of the compiler's transformation logic using light-weight operator specifications; (ii) performing gradient-based search to find model inputs that avoid any floating-point exceptional values during model execution, reducing the chance of missed bugs or false alarms; and (iii) using differential testing to identify bugs. We implemented this approach in NNSmith which has found 72 new bugs for TVM, TensorRT, ONNXRuntime, and PyTorch to date. Of these 58 have been confirmed and 51 have been fixed by their respective project maintainers.
翻译:TVM 和 TensorRT 等深学习( DL) 编译者正在越来越多地利用诸如 TVM 和 TensorRT 等新的模糊测试方法优化深神经网络模型,以满足性能、资源利用和其他要求。这些编译者中的错误可能导致模型的语义与原版有差异,产生错误的结果,从而腐蚀下游应用程序的正确性。然而,由于这些编译者中找出错误是复杂的,因此发现错误是困难的。在这项工作中,我们提出了在深学习编编译者中找出错误的新模糊测试方法。我们的核心方法包括:(一) 生成多种但有效的 DNNN 测试模型,这些模型可以使用轻量操作操作员的规格,对编译者的大部分转换逻辑进行多种不同的测试;(二) 进行基于梯度的搜索,寻找模型输入,以避免在模型执行期间出现任何浮点异常值,减少错失的错误或错误警报的机会;(三) 使用差异测试来识别错误。我们在NNSMith 采用了这一方法,为 TVM、 TensorRT、 TensorRT、 ONXRT、ONRuntime 和PyTorrch 迄今已经确认了了51个项目。