In the past decade, Deep Learning (DL) systems have been widely deployed in various domains to facilitate our daily life. Meanwhile, it is extremely challenging to ensure the correctness of DL systems (e.g., due to their intrinsic nondeterminism), and bugs in DL systems can cause serious consequences and may even threaten human lives. In the literature, researchers have explored various techniques to test, analyze, and verify DL models, since their quality directly affects the corresponding system behaviors. Recently, researchers have also proposed novel techniques for testing the underlying operator-level DL libraries (such as TensorFlow and PyTorch), which provide general binary implementations for each high-level DL operator for running various DL models on many platforms. However, there is still limited work targeting the reliability of the emerging tensor compilers, which aim to directly compile high-level tensor computation graphs into high-performance binaries for better efficiency, portability, and scalability. In this paper, we target the important problem of tensor compiler testing, and have proposed Tzer, a practical fuzzing technique for the widely used TVM tensor compiler. Tzer focuses on mutating the low-level Intermediate Representation (IR) for TVM due to the limited mutation space for the high-level IR. More specifically, Tzer leverages both general-purpose and tensor-compiler-specific mutators guided by coverage feedback for evolutionary IR mutation; furthermore, Tzer also performs pass mutation in tandem with IR mutation for more effective fuzzing. Our results show that Tzer substantially outperforms existing fuzzing techniques on tensor compiler testing, with 75% higher coverage and 50% more valuable tests than the 2nd-best technique. To date, Tzer has detected 49 previously unknown bugs for TVM, with 37 bugs confirmed and 25 bugs fixed (PR merged).
翻译:在过去十年中,深学习(DL)系统被广泛部署在多个领域,以方便我们的日常生活。与此同时,确保DL系统(例如,由于其内在的不确定性)的正确性是极具挑战性的,DL系统的错误可能会造成严重的后果,甚至威胁人的生命。在文献中,研究人员探索了各种技术来测试、分析和核查DL模型,因为其质量直接影响到相应的系统行为。最近,研究人员还提出了测试49个操作员级的变异性DL图书馆(如TensorFlow和PyTorrich)的新技术,这些系统为每个高级 DL操作员在许多平台上运行各种DL模型提供一般的二进制实施。然而,针对新兴变色调汇编器的可靠性的工作仍然有限,其目的是直接将高水平的变速计算图编成高效率、可移动性、可变性。在本文件中,我们以变异性调结果测试的重要问题为对象,并提议Tzer、更实用的变异性变性调技术,用于高层次的变异性电压数据采集数据。