经典模糊测试的相关性:我们解决了这个吗? (The Relevance of Classic Fuzz Testing: Have We Solved This One?)

As fuzz testing has passed its 30th anniversary, and in the face of the incredible progress in fuzz testing techniques and tools, the question arises if the classic, basic fuzz technique is still useful and applicable? In that tradition, we have updated the basic fuzz tools and testing scripts and applied them to a large collection of Unix utilities on Linux, FreeBSD, and MacOS. As before, our failure criteria was whether the program crashed or hung. We found that 9 crash or hang out of 74 utilities on Linux, 15 out of 78 utilities on FreeBSD, and 12 out of 76 utilities on MacOS. A total of 24 different utilities failed across the three platforms. We note that these failure rates are somewhat higher than our in previous 1995, 2000, and 2006 studies of the reliability of command line utilities. In the basic fuzz tradition, we debugged each failed utility and categorized the causes the failures. Classic categories of failures, such as pointer and array errors and not checking return codes, were still broadly present in the current results. In addition, we found a couple of new categories of failures appearing. We present examples of these failures to illustrate the programming practices that allowed them to happen. As a side note, we tested the limited number of utilities available in a modern programming language (Rust) and found them to be of no better reliability than the standard ones.

翻译：由于模糊测试已经过了30周年,在模糊测试技术和工具的令人难以置信的进展面前,问题在于经典、基本模糊技术是否仍然有用和适用?在这一传统中,我们更新了基本的模糊工具和测试脚本,并将其应用于Linux、FreeBSD和MacOS上的大批Unix公用事业。与以往一样,我们的失败标准是程序是否崩溃还是挂起。我们发现,Linux的74个公用事业中有9个崩溃或挂起,FreeBSD的78个公用事业中有15个,MacOS的76个公用事业中有12个。共有24个不同的公用事业在三个平台上都失败。我们注意到,这些故障率略高于我们在1995年、2000年和2006年对指挥线公用事业可靠性的研究。在基本的模糊传统中,我们调试了每一个失败的公用事业,并对失败的原因进行了分类。在目前的结果中仍然广泛存在典型的故障类别,例如指示器和阵列错误以及不检查返回代码。此外,我们发现出现了几组新的故障类别。我们发现三个平台上出现的新的失败事例。我们对这些失败的例子作了有限的一面说明,用来说明,这些失败的可靠性是用来说明。