GraphLocator：基于图引导因果推理的问题定位方法 (GraphLocator: Graph-guided Causal Reasoning for Issue Localization)

The issue localization task aims to identify the locations in a software repository that requires modification given a natural language issue description. This task is fundamental yet challenging in automated software engineering due to the semantic gap between issue description and source code implementation. This gap manifests as two mismatches:(1) symptom-to-cause mismatches, where descriptions do not explicitly reveal underlying root causes; (2) one-to-many mismatches, where a single issue corresponds to multiple interdependent code entities. To address these two mismatches, we propose GraphLocator, an approach that mitigates symptom-to-cause mismatches through causal structure discovering and resolves one-to-many mismatches via dynamic issue disentangling. The key artifact is the causal issue graph (CIG), in which vertices represent discovered sub-issues along with their associated code entities, and edges encode the causal dependencies between them. The workflow of GraphLocator consists of two phases: symptom vertices locating and dynamic CIG discovering; it first identifies symptom locations on the repository graph, then dynamically expands the CIG by iteratively reasoning over neighboring vertices. Experiments on three real-world datasets demonstrates the effectiveness of GraphLocator: (1) Compared with baselines, GraphLocator achieves more accurate localization with average improvements of +19.49% in function-level recall and +11.89% in precision. (2) GraphLocator outperforms baselines on both symptom-to-cause and one-to-many mismatch scenarios, achieving recall improvement of +16.44% and +19.18%, precision improvement of +7.78% and +13.23%, respectively. (3) The CIG generated by GraphLocator yields the highest relative improvement, resulting in a 28.74% increase in performance on downstream resolving task.

翻译：问题定位任务旨在根据自然语言问题描述，识别软件仓库中需要修改的代码位置。该任务在自动化软件工程中具有基础性且充满挑战，主要源于问题描述与源代码实现之间的语义鸿沟。这种鸿沟具体表现为两种失配：(1) 症状-原因失配，即描述未明确揭示潜在根本原因；(2) 一对多失配，即单个问题对应多个相互依赖的代码实体。为应对这两种失配，本文提出GraphLocator方法：通过因果结构发现缓解症状-原因失配，借助动态问题解耦解决一对多失配。其核心构件是因果问题图（CIG），其中顶点表示已发现的子问题及其关联代码实体，边编码它们之间的因果依赖关系。GraphLocator的工作流程包含两个阶段：症状顶点定位与动态CIG发现；该方法首先在仓库图中定位症状位置，随后通过迭代推理相邻顶点动态扩展CIG。在三个真实数据集上的实验验证了GraphLocator的有效性：(1) 相较于基线方法，GraphLocator实现了更精准的定位，函数级召回率平均提升+19.49%，精确率平均提升+11.89%；(2) 在症状-原因失配和一对多失配场景下，GraphLocator均优于基线方法，召回率分别提升+16.44%和+19.18%，精确率分别提升+7.78%和+13.23%；(3) GraphLocator生成的CIG带来最高相对改进，使下游问题解决任务的性能提升28.74%。