基于求解器在环框架改进LLM在逻辑谜题求解中答案集编程的性能 (A Solver-in-the-Loop Framework for Improving LLMs on Answer Set Programming for Logic Puzzle Solving)

The rise of large language models (LLMs) has sparked interest in coding assistants. While general-purpose programming languages are well supported, generating code for domain-specific languages remains a challenging problem for LLMs. In this paper, we focus on the LLM-based generation of code for Answer Set Programming (ASP), a particularly effective approach for finding solutions to combinatorial search problems. The effectiveness of LLMs in ASP code generation is currently hindered by the limited number of examples seen during their initial pre-training phase. In this paper, we introduce a novel ASP-solver-in-the-loop approach for solver-guided instruction-tuning of LLMs to addressing the highly complex semantic parsing task inherent in ASP code generation. Our method only requires problem specifications in natural language and their solutions. Specifically, we sample ASP statements for program continuations from LLMs for unriddling logic puzzles. Leveraging the special property of declarative ASP programming that partial encodings increasingly narrow down the solution space, we categorize them into chosen and rejected instances based on solver feedback. We then apply supervised fine-tuning to train LLMs on the curated data and further improve robustness using a solver-guided search that includes best-of-N sampling. Our experiments demonstrate consistent improvements in two distinct prompting settings on two datasets.

翻译：大型语言模型（LLM）的兴起引发了人们对编码助手的兴趣。虽然通用编程语言已获得良好支持，但为领域特定语言生成代码对LLM而言仍是具有挑战性的问题。本文聚焦于基于LLM的答案集编程（ASP）代码生成——这是一种解决组合搜索问题特别有效的方法。当前LLM在ASP代码生成中的有效性受限于其初始预训练阶段接触的有限示例数量。本文提出一种新颖的ASP求解器在环方法，通过求解器引导的指令微调来解决ASP代码生成中固有的高度复杂语义解析任务。我们的方法仅需自然语言描述的问题规范及其对应解。具体而言，我们从LLM中采样ASP语句作为程序续写以解析逻辑谜题。利用声明式ASP编程的特殊性质——部分编码会持续缩小解空间，我们根据求解器反馈将采样结果划分为采纳样本与拒绝样本。随后通过监督微调在筛选数据上训练LLM，并利用包含N选一采样的求解器引导搜索进一步提升鲁棒性。实验结果表明，该方法在两个数据集上的两种不同提示设置中均实现了持续改进。