The tremendous success of Stack Overflow has accumulated an extensive corpus of software engineering knowledge, thus motivating researchers to propose various solutions for analyzing its content.The performance of such solutions hinges significantly on the selection of representation model for Stack Overflow posts. As the volume of literature on Stack Overflow continues to burgeon, it highlights the need for a powerful Stack Overflow post representation model and drives researchers' interest in developing specialized representation models that can adeptly capture the intricacies of Stack Overflow posts. The state-of-the-art (SOTA) Stack Overflow post representation models are Post2Vec and BERTOverflow, which are built upon trendy neural networks such as convolutional neural network (CNN) and Transformer architecture (e.g., BERT). Despite their promising results, these representation methods have not been evaluated in the same experimental setting. To fill the research gap, we first empirically compare the performance of the representation models designed specifically for Stack Overflow posts (Post2Vec and BERTOverflow) in a wide range of related tasks, i.e., tag recommendation, relatedness prediction, and API recommendation. To find more suitable representation models for the posts, we further explore a diverse set of BERT-based models, including (1) general domain language models (RoBERTa and Longformer) and (2) language models built with software engineering-related textual artifacts (CodeBERT, GraphCodeBERT, and seBERT). However, it also illustrates the ``No Silver Bullet'' concept, as none of the models consistently wins against all the others. Inspired by the findings, we propose SOBERT, which employs a simple-yet-effective strategy to improve the best-performing model by continuing the pre-training phase with the textual artifact from Stack Overflow.
翻译:Stack Overflow公司的巨大成功积累了大量软件工程知识,从而激励研究人员提出各种分析其内容的解决方案。这些解决方案的绩效在很大程度上取决于Stack Overflow 公司的演示模型。随着Stack Overflow公司的大量文献继续涌现,它凸显出需要强大的Stack Overflow 公司演示模型,并促使研究人员有兴趣开发能够正确捕捉Stack Overflow 公司职位复杂性的专门演示模型。最先进的(SOTA)(SOTA)(SOTA)(SOTA)(SO-The Stack Overproflow )代表模型是Post2Vec和BERT Overproflow公司,它们建在动态神经神经网络,如神经网络(CNN)和变压器结构(例如BERT)等。尽管它们有希望,但是这些演示方法并没有在同一实验环境中被评估。为了填补研究空白,我们首先将基于Sack Overfload 的演示模型(Post2Vec和BEROverflow )的演示模型的绩效与Sload-model (Peral-I) Exeralalalal Proview Exalalalalalalalalalal Produstrations) 在一系列相关模型中找到出一个最广泛的模型上的模型上找到一个更好的模型, 和Slevental-ILIFIFIFILI.</s>