Most software maintenance and evolution tasks require developers to understand the source code of their software systems. Software developers usually inspect class comments to gain knowledge about program behavior, regardless of the programming language they are using. Unfortunately, (i) different programming languages present language-specific code commenting notations/guidelines; and (ii) the source code of software projects often lacks comments that adequately describe the class behavior, which complicates program comprehension and evolution activities. To handle these challenges, this paper investigates the different language-specific class commenting practices of three programming languages: Python, Java, and Smalltalk. In particular, we systematically analyze the similarities and differences of the information types found in class comments of projects developed in these languages. We propose an approach that leverages two techniques, namely Natural Language Processing and Text Analysis, to automatically identify various types of information from class comments i.e., the specific types of semantic information found in class comments. To the best of our knowledge, no previous work has provided a comprehensive taxonomy of class comment types for these three programming languages with the help of a common automated approach. Our results confirm that our approach can classify frequent class comment information types with high accuracy for Python, Java, and Smalltalk programming languages. We believe this work can help to monitor and assess the quality and evolution of code comments in different program languages, and thus support maintenance and evolution tasks.
翻译:软件的维护和演变任务大多要求开发者理解软件系统的源代码。软件开发者通常检查阶级评论,以获得关于程序行为的知识,而不论其使用何种编程语言。不幸的是,(一) 不同的编程语言提供具体语言的代码,以评论批注/指南;和(二) 软件项目的源代码往往缺乏充分描述类别行为的评论,从而使得程序理解和演进活动复杂化。为了应对这些挑战,本文件调查了三种编程语言:Python、Java和Smallaltal 的不同语言的特定语言评论做法。特别是,我们系统地分析在用这些语言开发的项目的类评论中发现的信息类型之间的相似和差异。我们建议一种方法,利用两种技术,即自然语言处理和文本分析,自动确定来自类评论的不同类型信息,即,即,在课堂评论中发现具体类型的语系信息,使程序理解和演进性活动变得复杂。根据我们的知识,过去没有任何工作为这三种编程语言提供全面的分类和分类。我们的结果证实,我们的方法可以将频繁的课堂评论类型归类信息类型分类,并因此对Py 编程的编程和编程进行高精准。