Information Retrieval-based Bug Localization (IRBL) aims to identify buggy source files for a given bug report. Traditional and deep-learning-based IRBL techniques often suffer from vocabulary mismatch and dependence on project-specific metadata, while recent Large Language Model (LLM)-based approaches are limited by insufficient contextual information. To address these issues, we propose GenLoc, an LLM-based technique that combines semantic retrieval with code-exploration functions to iteratively analyze the code base and identify potential buggy files. We evaluate GenLoc on two diverse datasets: a benchmark of 9,097 bugs from six large open-source projects and the GHRB (GitHub Recent Bugs) dataset of 131 recent bugs across 16 projects. Results demonstrate that GenLoc substantially outperforms traditional IRBL, deep learning approaches and recent LLM-based methods, while also localizing bugs that other techniques fail to detect.
翻译:暂无翻译