基于LLM引导输入变异与语义反馈的混合模糊测试 (Hybrid Fuzzing with LLM-Guided Input Mutation and Semantic Feedback)

Software fuzzing has become a cornerstone in automated vulnerability discovery, yet existing mutation strategies often lack semantic awareness, leading to redundant test cases and slow exploration of deep program states. In this work, I present a hybrid fuzzing framework that integrates static and dynamic analysis with Large Language Model (LLM)-guided input mutation and semantic feedback. Static analysis extracts control-flow and data-flow information, which is transformed into structured prompts for the LLM to generate syntactically valid and semantically diverse inputs. During execution, I augment traditional coverage-based feedback with semantic feedback signals-derived from program state changes, exception types, and output semantics-allowing the fuzzer to prioritize inputs that trigger novel program behaviors beyond mere code coverage. I implement our approach atop AFL++, combining program instrumentation with embedding-based semantic similarity metrics to guide seed selection. Evaluation on real-world open-source targets, including libpng, tcpdump, and sqlite, demonstrates that our method achieves faster time-to-first-bug, higher semantic diversity, and a competitive number of unique bugs compared to state-of-the-art fuzzers. This work highlights the potential of combining LLM reasoning with semantic-aware feedback to accelerate and deepen vulnerability discovery.

翻译：软件模糊测试已成为自动化漏洞发现的核心技术，然而现有变异策略通常缺乏语义感知能力，导致测试用例冗余且深层程序状态探索缓慢。本研究提出一种混合模糊测试框架，将静态与动态分析结合大型语言模型（LLM）引导的输入变异与语义反馈。静态分析提取控制流与数据流信息，并转化为结构化提示词供LLM生成语法有效且语义多样化的输入。在执行过程中，我在传统基于覆盖率的反馈机制基础上，增加了源自程序状态变化、异常类型及输出语义的语义反馈信号，使模糊测试器能够优先处理那些触发超越单纯代码覆盖的新型程序行为的输入。我在AFL++平台上实现了该方法，通过结合程序插桩与基于嵌入的语义相似度度量来指导种子选择。在libpng、tcpdump和sqlite等真实开源目标上的评估表明，相较于最先进的模糊测试工具，本方法实现了更快的首次漏洞发现时间、更高的语义多样性，并在独立漏洞数量上具有竞争力。这项工作凸显了将LLM推理与语义感知反馈相结合，在加速和深化漏洞发现方面的潜力。