Context: Stack Overflow is very helpful for software developers who are seeking answers to programming problems. Previous studies have shown that a growing number of questions are of low quality and thus obtain less attention from potential answerers. Gao et al. proposed an LSTM-based model (i.e., BiLSTM-CC) to automatically generate question titles from the code snippets to improve the question quality. However, only using the code snippets in the question body cannot provide sufficient information for title generation, and LSTMs cannot capture the long-range dependencies between tokens. Objective: This paper proposes CCBERT, a deep learning based novel model to enhance the performance of question title generation by making full use of the bi-modal information of the entire question body. Method: CCBERT follows the encoder-decoder paradigm and uses CodeBERT to encode the question body into hidden representations, a stacked Transformer decoder to generate predicted tokens, and an additional copy attention layer to refine the output distribution. Both the encoder and decoder perform the multi-head self-attention operation to better capture the long-range dependencies. This paper builds a dataset containing around 200,000 high-quality questions filtered from the data officially published by Stack Overflow to verify the effectiveness of the CCBERT model. Results: CCBERT outperforms all the baseline models on the dataset. Experiments on both code-only and low-resource datasets show the superiority of CCBERT with less performance degradation. The human evaluation also shows the excellent performance of CCBERT concerning both readability and correlation criteria.
翻译:内容 : Stack Overflow 对正在寻找对编程问题答案的软件开发者非常有用 。 先前的研究显示, 越来越多的问题质量低, 因而得不到潜在回答者的关注。 Gao 等人提议了一个基于 LSTM 的模型( 即 BilsTM- CC ), 从代码片段自动生成问题标题, 以提高问题质量 。 但是, 仅使用问题体中的代码片断无法为标题生成提供足够的信息, LSTM 无法捕捉标识之间的长距离依赖性 。 目标 : 本文建议 CCBERT, 这是一种基于深层次学习的新型模型, 目的是通过充分利用整个问题体的双模式( 即 BilsTM- CC ) 来提高问题标题生成的性能。 方法 : CCBERT 遵循了编码的编码, 将问题体格变换成隐藏的图象, 并增加了一个低调的注意层层, 以完善产出分布。 本文的多位自备的CBC- 运行运行运行新模式, 以更精确的Creciloveill 运行运行, 显示关于高的CBBBLELEB 数据 。