Fuzzing is one of the prevailing methods for vulnerability detection. However, even state-of-the-art fuzzing methods become ineffective after some period of time, i.e., the coverage hardly improves as existing methods are ineffective to focus the attention of fuzzing on covering the hard-to-trigger program paths. In other words, they cannot generate inputs that can break the bottleneck due to the fundamental difficulty in capturing the complex relations between the test inputs and program coverage. In particular, existing fuzzers suffer from the following main limitations: 1) lacking an overall analysis of the program to identify the most "rewarding" seeds, and 2) lacking an effective mutation strategy which could continuously select and mutates the more relevant "bytes" of the seeds. In this work, we propose an approach called ATTuzz to address these two issues systematically. First, we propose a lightweight dynamic analysis technique which estimates the "reward" of covering each basic block and selects the most rewarding seeds accordingly. Second, we mutate the selected seeds according to a neural network model which predicts whether a certain "rewarding" block will be covered given certain mutation on certain bytes of a seed. The model is a deep learning model equipped with attention mechanism which is learned and updated periodically whilst fuzzing. Our evaluation shows that ATTuzz significantly outperforms 5 state-of-the-art grey-box fuzzers on 13 popular real-world programs at achieving higher edge coverage and finding new bugs. In particular, ATTuzz achieved 2X edge coverage and 4X bugs detected than AFL over 24-hour runs. Moreover, ATTuzz persistently improves the edge coverage in the long run, i.e., achieving 50% more coverage than AFL in 5 days.
翻译:模糊是常见的脆弱性检测方法之一。然而,即使是最先进的模糊方法,在一段时间后也变得无效,也就是说,由于现有方法无法将模糊的注意力集中在覆盖难以触动的程序路径上,因此覆盖范围几乎没有什么改善。换句话说,由于很难捕捉测试投入与程序覆盖之间的复杂关系,它们无法产生能够打破瓶颈的投入。特别是,现有的模糊方法受到以下主要限制:(1) 缺乏对程序的全面分析,以确定最“更新”种子,和(2) 缺乏有效的突变战略,这种战略可以持续选择和变异种子的“字节”路径。在这项工作中,我们建议一种叫ATTuzz的方法,以便系统地解决这两个问题。首先,我们建议一种较轻的动态分析技术,用来估计覆盖每个基本块的“反向”并相应地选择最有收获的种子。第二,我们将选定的种子变异于一个神经网络模型,用来预测某个“更新”的“更新”种子覆盖范围是否在24天里,在2小时里运行,一个固定的种子模型将会被覆盖。在一定的时间里,我们学习了。在一定的模型里变变变的轨道上,在学习了。