Despite decades of research, software bug localization remains challenging due to heterogeneous content and inherent ambiguities in bug reports. Existing methods such as Information Retrieval (IR)-based approaches often attempt to match source documents to bug reports, overlooking the context and semantics of the source code. On the other hand, Large Language Models (LLM) (e.g., Transformer models) show promising results in understanding both texts and code. However, they have not been yet adapted well to localize software bugs against bug reports. They could be also data or resource-intensive. To bridge this gap, we propose, IQLoc, a novel bug localization approach that capitalizes on the strengths of both IR and LLM-based approaches. In particular, we leverage the program semantics understanding of transformer-based models to reason about the suspiciousness of code and reformulate queries during bug localization using Information Retrieval. To evaluate IQLoc, we refine the Bench4BL benchmark dataset and extend it by incorporating ~30% more recent bug reports, resulting in a benchmark containing ~7.5K bug reports. We evaluated IQLoc using three performance metrics and compare it against four baseline techniques. Experimental results demonstrate its superiority, achieving up to 58.52% and 60.59% in MAP, 61.49% and 64.58% in MRR, and 69.88% and 100.90% in HIT@K for the test bug reports with random and time-wise splits, respectively. Moreover, IQLoc improves MAP by 91.67% for bug reports with stack traces, 72.73% for those that include code elements, and 65.38% for those containing only descriptions in natural language. By integrating program semantic understanding into Information Retrieval, IQLoc mitigates several longstanding challenges of traditional IR-based approaches in bug localization.
翻译:暂无翻译