Stack Overflow is one of the most popular programming communities where developers can seek help for their encountered problems. Nevertheless, if inexperienced developers fail to describe their problems clearly, it is hard for them to attract sufficient attention and get the anticipated answers. We propose M$_3$NSCT5, a novel approach to automatically generate multiple post titles from the given code snippets. Developers may use the generated titles to find closely related posts and complete their problem descriptions. M$_3$NSCT5 employs the CodeT5 backbone, which is a pre-trained Transformer model having an excellent language understanding and generation ability. To alleviate the ambiguity issue that the same code snippets could be aligned with different titles under varying contexts, we propose the maximal marginal multiple nucleus sampling strategy to generate multiple high-quality and diverse title candidates at a time for the developers to choose from. We build a large-scale dataset with 890,000 question posts covering eight programming languages to validate the effectiveness of M$_3$NSCT5. The automatic evaluation results on the BLEU and ROUGE metrics demonstrate the superiority of M$_3$NSCT5 over six state-of-the-art baseline models. Moreover, a human evaluation with trustworthy results also demonstrates the great potential of our approach for real-world application.
翻译:然而,如果没有经验的开发者没有明确地描述他们的问题,那么他们很难吸引足够的关注和得到预期的答案。我们建议采取新颖的办法,从给定代码片段自动生成多个员额标题。开发者可以使用所产生的标题来寻找密切相关的职位并完成问题描述。M$3$NSCT5使用DCT5主干线,这是一个经过培训的具有极好的语言理解和生成能力的变异器模型。为了减轻同一代码片段在不同情况下可能与不同标题相一致的模糊问题,我们建议采用最高边际多核取样战略,以便在开发者从中选择多个高品质和多样化的职称候选人。我们建立一个包含890 000个问题员额的大规模数据集,涵盖8种方案语言,以验证M$3$NSCT5. BLEU和ROUGE的自动评价结果,显示M_3$NS5优于六州基准模型,并展示了我们真正的基准模型的可信任性。