TODO comments are very widely used by software developers to describe their pending tasks during software development. However, after performing the task developers sometimes neglect or simply forget to remove the TODO comment, resulting in obsolete TODO comments. These obsolete TODO comments can confuse development teams and may cause the introduction of bugs in the future, decreasing the software's quality and maintainability. In this work, we propose a novel model, named TDCleaner (TODO comment Cleaner), to identify obsolete TODO comments in software projects. TDCleaner can assist developers in just-in-time checking of TODO comments status and avoid leaving obsolete TODO comments. Our approach has two main stages: offline learning and online prediction. During offline learning, we first automatically establish <code_change, todo_comment, commit_msg> training samples and leverage three neural encoders to capture the semantic features of TODO comment, code change and commit message respectively. TDCleaner then automatically learns the correlations and interactions between different encoders to estimate the final status of the TODO comment. For online prediction, we check a TODO comment's status by leveraging the offline trained model to judge the TODO comment's likelihood of being obsolete. We built our dataset by collecting TODO comments from the top-10,000 Python and Java Github repositories and evaluated TDCleaner on them. Extensive experimental results show the promising performance of our model over a set of benchmarks. We also performed an in-the-wild evaluation with real-world software projects, we reported 18 obsolete TODO comments identified by TDCleaner to Github developers and 9 of them have already been confirmed and removed by the developers, demonstrating the practical usage of our approach.
翻译:TODO 评论被软件开发者非常广泛地用于描述软件开发过程中的待决任务。 但是,在任务开发者完成任务后,有时忽视或干脆忘记删除TODO 评论,导致TODO 过时的评论。 这些过时的TODO 评论可能会混淆开发团队,可能导致未来引入错误,降低软件的质量和可维护性。 在这项工作中,我们提出了一个名为 TDCleaner (TODO 评论清洁) 的新模型, 以识别软件项目中过时的TODO 评论。 TDCleaner 可以协助开发者及时检查TODO 评论状态, 避免丢弃过时的TODO 评论。 我们的方法有两个主要阶段: 离线学习和在线预测。 在离线学习期间, 我们首先自动建立 < code_ change_ change, todo_comms > 培训的样本, 利用三个神经编码编码来捕捉到TODO 评论、 代码更改和声明信息。 TDC 然后自动地了解不同编码的开发者之间的关联和互动关系和互动, 来估计TODO 评论的最后状态。 我们通过在线预测,我们所训练的DO dal disal disalmentalmentalmental admental admentalmentalmentalmental a dial admental dictions real dictions to to to to to diamentalmental be to to to to diamental diamental be to diamental diamental dimental dimental dimentaldaldaldald dictionsaldmentmental be be be be be be be be be be be viewd 。