Context: Classification of software requirements into different categories is a critically important task in requirements engineering (RE). Developing machine learning (ML) approaches for requirements classification has attracted great interest in the RE community since the 2000s. Objective: This paper aims to address two related problems that have been challenging real-world applications of ML approaches: the problems of class imbalance and high dimensionality with low sample size data (HDLSS). These problems can greatly degrade the classification performance of ML methods. Method: The paper proposes HC4RC, a novel ML approach for multiclass classification of requirements. HC4RC solves the aforementioned problems through semantic-role-based feature selection, dataset decomposition and hierarchical classification. We experimentally compare the effectiveness of HC4RC with three closely related approaches - two of which are based on a traditional statistical classification model whereas one uses an advanced deep learning model. Results: Our experiment shows: 1) The class imbalance and HDLSS problems present a challenge to both traditional and advanced ML approaches. 2) The HC4RC approach is simple to use and can effectively address the class imbalance and HDLSS problems compared to similar approaches. Conclusion: This paper makes an important practical contribution to addressing the class imbalance and HDLSS problems in multiclass classification of software requirements.
翻译:目标:本文件旨在解决两个相关问题,这两个相关问题一直对多边采购办法的实际应用具有挑战性:类别不平衡和高维度问题,其抽样量低(HDLSS)数据低;这些问题可以大大降低多边采购办法的分类性能。方法:本文件提出了HC4RC,这是要求的多级分类的新ML方法。 HC4RC方法简单易用,能够有效地解决基于语义的特征选择、数据集分解和等级分类方面的上述问题。我们实验性地比较了高C4RC的有效性,三个密切相关的方法――其中两个方法以传统的统计分类模式为基础,而一个则使用先进的深层次学习模式。结果:我们的实验显示:1) 类别不平衡和高层次LSS问题对传统和先进的多边采购办法提出了挑战。2) HC4RC方法简单易用,能够有效地解决类不平衡和高语言语言系统问题,而高语言水平的分类要求与类似水平的软件分类要求相比,是一份重要文件。</s>