Large Language Models (LLMs) have shown a surprising level of performance on multiple software engineering problems. However, they have not yet been applied to fault localization (FL), in which one must find the code element responsible for a bug from a potentially vast software repository. Nonetheless, LLM application to FL has the potential to benefit developers both in terms of performance and explainability. In this work, we present AutoFL, an automated fault localization technique that only requires a single failing test, and in its fault localization process generates an explanation about why the given test fails. Using the function call API of the ChatGPT large language model, we provide tools that allow it to explore a large source code repository, which would otherwise pose a significant challenge as it would be impossible to fit all the source code within the limited prompt length. Our results indicate that on the widely used Defects4J benchmark, AutoFL could identify the faulty method on the first try more often than all standalone techniques we compared against from prior work. Nonetheless, there is ample room to improve performance, and we encourage the further experimentation of language model-based fault localization as a promising research area.
翻译:暂无翻译