Context and motivation: Recently, Large Language Models (LLMs) like ChatGPT have demonstrated remarkable proficiency in various Natural Language Processing (NLP) tasks. Their application in Requirements Engineering (RE), especially in requirements classification, has gained increasing interest. Question/problem: In our research, we conducted an extensive empirical evaluation of ChatGPT models including text-davinci-003, gpt-3.5-turbo, and gpt-4 in both zero-shot and few-shot settings for requirements classification. The question arises as to how these models compare to traditional classification methods, specifically Support Vector Machine (SVM) and Long Short-Term Memory (LSTM). Principal ideas/results: Based on five diverse datasets, our results show that ChatGPT consistently outperforms LSTM, and while ChatGPT is more effective than SVM in classifying functional requirements (FR), SVM is better in classifying non-functional requirements (NFR). Our results also show that contrary to our expectations, the few-shot setting does not always lead to enhanced performance; in most instances, it was found to be suboptimal. Contribution: Our findings underscore the potential of LLMs in the RE domain, suggesting that they could play a pivotal role in future software engineering processes, particularly as tools to enhance requirements classification.
翻译:暂无翻译