GoldenFuzz：基于黄金参考模型的生成式硬件模糊测试 (GoldenFuzz: Generative Golden Reference Hardware Fuzzing)

Modern hardware systems, driven by demands for high performance and application-specific functionality, have grown increasingly complex, introducing large surfaces for bugs and security-critical vulnerabilities. Fuzzing has emerged as a scalable solution for discovering such flaws. Yet, existing hardware fuzzers suffer from limited semantic awareness, inefficient test refinement, and high computational overhead due to reliance on slow device simulation. In this paper, we present GoldenFuzz, a novel two-stage hardware fuzzing framework that partially decouples test case refinement from coverage and vulnerability exploration. GoldenFuzz leverages a fast, ISA-compliant Golden Reference Model (GRM) as a ``digital twin'' of the Device Under Test (DUT). It fuzzes the GRM first, enabling rapid, low-cost test case refinement, accelerating deep architectural exploration and vulnerability discovery on DUT. During the fuzzing pipeline, GoldenFuzz iteratively constructs test cases by concatenating carefully chosen instruction blocks that balance the subtle inter- and intra-instructions quality. A feedback-driven mechanism leveraging insights from both high- and low-coverage samples further enhances GoldenFuzz's capability in hardware state exploration. Our evaluation of three RISC-V processors, RocketChip, BOOM, and CVA6, demonstrates that GoldenFuzz significantly outperforms existing fuzzers in achieving the highest coverage with minimal test case length and computational overhead. GoldenFuzz uncovers all known vulnerabilities and discovers five new ones, four of which are classified as highly severe with CVSS v3 severity scores exceeding seven out of ten. It also identifies two previously unknown vulnerabilities in the commercial BA51-H core extension.

翻译：现代硬件系统在追求高性能和特定应用功能需求的驱动下，已变得日益复杂，这引入了大量的缺陷和安全关键漏洞暴露面。模糊测试已成为发现此类缺陷的一种可扩展解决方案。然而，现有的硬件模糊测试工具由于依赖缓慢的设备仿真，存在语义感知有限、测试用例优化效率低下以及计算开销高等问题。本文提出GoldenFuzz，一种新颖的两阶段硬件模糊测试框架，该框架将测试用例优化与覆盖率和漏洞探索部分解耦。GoldenFuzz利用一个快速的、符合指令集架构（ISA）规范的黄金参考模型（GRM）作为被测设备（DUT）的“数字孪生体”。它首先对GRM进行模糊测试，从而实现快速、低成本的测试用例优化，加速在DUT上进行深入的架构探索和漏洞发现。在模糊测试流程中，GoldenFuzz通过拼接精心选择的指令块来迭代构建测试用例，这些指令块平衡了微妙的指令间和指令内质量。一个利用高覆盖率与低覆盖率样本洞察的反馈驱动机制进一步增强了GoldenFuzz在硬件状态探索方面的能力。我们对三款RISC-V处理器（RocketChip、BOOM和CVA6）的评估表明，GoldenFuzz在实现最高覆盖率方面显著优于现有模糊测试工具，同时测试用例长度和计算开销最小。GoldenFuzz发现了所有已知漏洞，并新发现了五个漏洞，其中四个被归类为高危漏洞，其CVSS v3严重性评分超过十分之七。此外，它还在商业BA51-H核心扩展中识别出两个先前未知的漏洞。