Software teams are increasingly adopting different tools and communication channels to aid the software collaborative development model and coordinate tasks. Among such resources, Programming Community-based Question Answering (PCQA) forums have become widely used by developers. Such environments enable developers to get and share technical information. Interested in supporting the development and management of Open Source Software (OSS) projects, GitHub announced GitHub Discussions - a native forum to facilitate collaborative discussions between users and members of communities hosted on the platform. As GitHub Discussions resembles PCQA forums, it faces challenges similar to those faced by such environments, which include the occurrence of related discussions (duplicates or near-duplicated posts). While duplicate posts have the same content - and may be exact copies - near-duplicates share similar topics and information. Both can introduce noise to the platform and compromise project knowledge sharing. In this paper, we address the problem of detecting related posts in GitHub Discussions. To do so, we propose an approach based on a Sentence-BERT pre-trained model: the RD-Detector. We evaluated RD-Detector using data from different OSS communities. OSS maintainers and Software Engineering (SE) researchers manually evaluated the RD-Detector results, which achieved 75% to 100% in terms of precision. In addition, maintainers pointed out practical applications of the approach, such as merging the discussions' threads and making discussions as comments on one another. OSS maintainers can benefit from RD-Detector to address the labor-intensive task of manually detecting related discussions and answering the same question multiple times.
翻译:软件团队越来越多地采用不同的工具和沟通渠道来帮助软件合作开发模式并协调任务。在这些资源中,基于社区问答平台(PCQA)的编程已被开发者广泛使用。这种环境使开发者能够获取和分享技术信息。对于支持开发和管理开放源码软件项目感兴趣的GitHub 宣布GitHub GitHub 讨论,这是一个本地论坛,目的是促进用户与平台主办社区成员之间的合作讨论。GitHub讨论类似于PCQA论坛,它面临着类似于此类环境面临的挑战,包括相关讨论的发生(复制或近似复制的邮政)。虽然重复的邮政有相同的内容,也可能是准确的副本。对于支持开发和管理开放源软件(OSS)项目的开发和管理,GitHub 讨论是一个本地论坛,目的是促进用户与平台主办者之间的合作讨论。为了做到这一点,我们建议一种基于判决-BERT预培训模式的方法:RD-探测器。我们用RD-Serveror 来评估RD-SOD-Seralental 讨论, 将一个来自不同OIS的精确度数据, 和SIR-rbal-resservieweral-resservieweral 的版本,可以评估一个对一个对一个直接的精确到另一个的版本的版本的版本的版本,对一个时间进行这样的精确性讨论。我们。在SOOSA的版本进行一个时间进行评估。在SOFA-realisal-ex-ex-laualdal-ressaldal-resseraldaldal-dalviewaldaldaldaldaldaldaldaldaldaldaldaldaldalds)。