Fuzzing is an automated software testing technique broadly adopted by the industry. A popular variant is mutation-based fuzzing, which discovers a large number of bugs in practice. While the research community has studied mutation-based fuzzing for years now, the algorithms' interactions within the fuzzer are highly complex and can, together with the randomness in every instance of a fuzzer, lead to unpredictable effects. Most efforts to improve this fragile interaction focused on optimizing seed scheduling. However, real-world results like Google's FuzzBench highlight that these approaches do not consistently show improvements in practice. Another approach to improve the fuzzing process algorithmically is optimizing mutation scheduling. Unfortunately, existing mutation scheduling approaches also failed to convince because of missing real-world improvements or too many user-controlled parameters whose configuration requires expert knowledge about the target program. This leaves the challenging problem of cleverly processing test cases and achieving a measurable improvement unsolved. We present DARWIN, a novel mutation scheduler and the first to show fuzzing improvements in a realistic scenario without the need to introduce additional user-configurable parameters, opening this approach to the broad fuzzing community. DARWIN uses an Evolution Strategy to systematically optimize and adapt the probability distribution of the mutation operators during fuzzing. We implemented a prototype based on the popular general-purpose fuzzer AFL. DARWIN significantly outperforms the state-of-the-art mutation scheduler and the AFL baseline in our own coverage experiment, in FuzzBench, and by finding 15 out of 21 bugs the fastest in the MAGMA benchmark. Finally, DARWIN found 20 unique bugs (including one novel bug), 66% more than AFL, in widely-used real-world applications.
翻译:模糊是一种行业广泛采用的自动化软件测试技术。 流行变异是一种基于突变的模糊技术, 它在实践中发现了大量的错误。 虽然研究界多年来一直在研究基于突变的模糊法, 算法在模糊器中的相互作用非常复杂, 再加上每个模糊器的随机性, 都会导致无法预测的效果。 多数改善这种脆弱的互动的努力都集中在优化种子时间安排上。 然而, 谷歌的 FuzzBench 这样的真实世界结果显示, 这些方法并不一贯地显示在实践上有所改进。 另一种改进模糊的实验范围的方法正在优化突变的时间安排。 不幸的是, 现有的突变的时间安排方法也未能令人信服, 因为缺少真实世界的改进, 或太多用户控制的参数, 其配置需要对目标程序有专家的了解。 这留下了一个挑战性的问题: 巧妙地处理测试案例, 并实现一个可衡量的改进。 我们介绍了DARWING, 一个新型的变异种表, 以及第一个显示在现实的情景中, 不需要引入更多的用户- RWI- RI- Ral 基准应用参数, 最后打开了Mal- IM IMal 的模型 的模型的模型的模型 。