基于极值理论的差分模糊测试风险估计 (Risk Estimation in Differential Fuzzing via Extreme Value Theory)

Differential testing is a highly effective technique for automatically detecting software bugs and vulnerabilities when the specifications involve an analysis over multiple executions simultaneously. Differential fuzzing, in particular, operates as a guided randomized search, aiming to find (similar) inputs that lead to a maximum difference in software outputs or their behaviors. However, fuzzing, as a dynamic analysis, lacks any guarantees on the absence of bugs: from a differential fuzzing campaign that has observed no bugs (or a minimal difference), what is the risk of observing a bug (or a larger difference) if we run the fuzzer for one or more steps? This paper investigates the application of Extreme Value Theory (EVT) to address the risk of missing or underestimating bugs in differential fuzzing. The key observation is that differential fuzzing as a random process resembles the maximum distribution of observed differences. Hence, EVT, a branch of statistics dealing with extreme values, is an ideal framework to analyze the tail of the differential fuzzing campaign to contain the risk. We perform experiments on a set of real-world Java libraries and use differential fuzzing to find information leaks via side channels in these libraries. We first explore the feasibility of EVT for this task and the optimal hyperparameters for EVT distributions. We then compare EVT-based extrapolation against baseline statistical methods like Markov's as well as Chebyshev's inequalities, and the Bayes factor. EVT-based extrapolations outperform the baseline techniques in 14.3% of cases and tie with the baseline in 64.2% of cases. Finally, we evaluate the accuracy and performance gains of EVT-enabled differential fuzzing in real-world Java libraries, where we reported an average saving of tens of millions of bytecode executions by an early stop.

翻译：差分测试是一种在涉及同时对多个执行过程进行分析的规范下，自动检测软件缺陷与漏洞的高效技术。差分模糊测试尤其作为一种引导式随机搜索运行，旨在寻找导致软件输出或其行为产生最大差异的（相似）输入。然而，模糊测试作为一种动态分析，无法保证缺陷的完全不存在：从一个未观察到缺陷（或差异极小）的差分模糊测试活动中，如果我们继续运行模糊测试一步或多步，观察到缺陷（或更大差异）的风险是多少？本文研究了应用极值理论（EVT）来解决差分模糊测试中遗漏或低估缺陷的风险。关键观察在于，差分模糊测试作为一个随机过程，类似于观测差异的最大值分布。因此，极值理论——统计学中处理极端值的一个分支——是分析差分模糊测试活动尾部以控制风险的理想框架。我们在真实世界的Java库集合上进行实验，利用差分模糊测试来发现这些库中通过侧信道的信息泄露。我们首先探讨了极值理论在此任务中的可行性及其分布的最优超参数。随后，我们将基于极值理论的外推方法与基线统计方法（如马尔可夫不等式、切比雪夫不等式）以及贝叶斯因子进行比较。基于极值理论的外推在14.3%的情况下优于基线技术，在64.2%的情况下与基线持平。最后，我们评估了启用极值理论的差分模糊测试在真实世界Java库中的准确性和性能提升，结果显示通过早期停止平均节省了数千万次字节码执行。