Automated bug priority inference can reduce the time overhead of bug triagers for priority assignments, improving the efficiency of software maintenance. Currently, there are two orthogonal lines for this task, i.e., traditional machine learning based (TML-based) and neural network based (NN-based) approaches. Although these approaches achieve competitive performance, our observation finds that existing approaches face the following two issues: 1) TML-based approaches require much manual feature engineering and cannot learn the semantic information of bug reports; 2) Both TML-based and NN-based approaches cannot effectively address the label imbalance problem because they are difficult to distinguish the semantic difference between bug reports with different priorities. In this paper, we propose CLeBPI (Contrastive Learning for Bug Priority Inference), which leverages pre-trained language model and contrastive learning to tackle the above-mentioned two issues. Specifically, CLeBPI is first pre-trained on a large-scale bug report corpus in a self-supervised way, thus it can automatically learn contextual representations of bug reports without manual feature engineering. Afterward, it is further pre-trained by a contrastive learning objective, which enables it to distinguish semantic differences between bug reports, learning more precise contextual representations for each bug report. When finishing pre-training, we can connect a classification layer to CLeBPI and fine-tune it for bug priority inference in a supervised way. To verify the effectiveness of CLeBPI, we choose four baseline approaches and conduct comparison experiments on a public dataset. The experimental results show that CLeBPI outperforms all baseline approaches by 23.86%-77.80% in terms of weighted average F1-score, showing its effectiveness.
翻译:自动错误优先度推断可以降低用于优先任务、提高软件维护效率的错误计数器的时间管理时间。 目前, 这项工作有两条垂直线, 即传统机器学习( TML) 和神经网络( NN) 方法。 虽然这些方法可以实现竞争性绩效, 但我们的观察发现, 现有方法面临以下两个问题:(1) 基于 TML 的方法需要大量手工特征工程, 无法学习错误报告的语义信息; (2) 以 TLLLLE 为基础的和 NNN 为基础的方法都无法有效解决标签不平衡问题, 因为它们很难区分不同优先事项的错误报告之间的语义差异。 在本文中, 我们建议使用基于传统机器学习( TMLLL) 和基于神经网络网络( NNN) 的方法。 虽然这些方法能够利用预先训练的语言模型和对比学习上述两个问题。 具体地, CLEBPI 以自我监督的方式在大型错误报告库中首先接受训练, 因此它可以通过不使用手动的精细度工程工程来自动学习错误报告的背景描述。 。 之后, 它会进一步将精确地将实验式的直径比报告联系起来, 。