基于语言模型为低资源语言方言编译器引导模糊测试工具 (Bootstrapping Fuzzers for Compilers of Low-Resource Language Dialects Using Language Models)

Modern extensible compiler frameworks-such as MLIR-enable rapid creation of domain-specific language dialects. This flexibility, however, makes correctness harder to ensure as the same extensibility that accelerates development also complicates maintaining the testing infrastructure. Extensible languages require automated test generation that is both dialect-agnostic (works across dialects without manual adaptation) and dialect-effective (targets dialect-specific features to find bugs). Existing approaches typically sacrifice one of these goals by either requiring manually constructed seed corpora for each dialect, or by failing to be effective. We present a dialect-agnostic and dialect-effective grammar-based and coverage-guided fuzzing approach for extensible compilers that combines two key insights from existing work: (i) the grammars of dialects, which already encode the structural and type constraints, can often be extracted automatically from the dialect specification; and (ii) these grammars can be used in combination with pre-trained large language models to automatically generate representative and diverse seed inputs from the full dialect space without requiring any manual input or training data. These seeds can then be used to bootstrap coverage-guided fuzzers. We built this approach into a tool, Germinator. When evaluated on six MLIR projects spanning 91 dialects, Germinator generated seeds improve line coverage by 10-120% over grammar-based baselines. We compare against grammar-based baselines because they are the only class of existing automatic seed generators that can be applied uniformly across MLIR's heterogeneous dialect ecosystem. Germinator discovers 88 previously unknown bugs (40 confirmed), including 23 in dialects with no prior automated test generators, demonstrating effective and controllable testing of low-resource dialects at scale.

翻译：现代可扩展编译器框架（如MLIR）支持快速创建领域特定语言方言。然而，这种灵活性使得正确性更难保证，因为加速开发的扩展性同时增加了测试基础设施的维护复杂度。可扩展语言需要既具备方言无关性（无需手动适配即可跨方言工作）又具备方言有效性（针对方言特性以发现缺陷）的自动化测试生成方法。现有方法通常需牺牲其中一项目标：要么要求为每个方言手动构建种子语料库，要么无法保证测试有效性。本文提出一种面向可扩展编译器的方言无关且方言有效的基于语法与覆盖引导的模糊测试方法，该方法融合了现有工作的两项关键洞见：（i）方言语法已编码结构约束与类型约束，通常可从方言规范中自动提取；（ii）这些语法可与预训练大语言模型结合，无需任何人工输入或训练数据，即可从完整方言空间中自动生成具有代表性且多样化的种子输入。这些种子随后可用于引导覆盖引导式模糊测试工具。我们将此方法实现为工具Germinator。在涵盖91种方言的六个MLIR项目评估中，Germinator生成的种子相较于基于语法的基线方法将行覆盖率提升了10%-120%。我们选择基于语法的基线进行比较，因为这是目前唯一可统一应用于MLIR异构方言生态系统的自动种子生成方法类别。Germinator发现了88个先前未知的缺陷（其中40个已确认），包括23个存在于尚无自动化测试生成器的方言中，证明其能对低资源方言实现规模化、有效且可控的测试。