Reading code is an essential activity in software maintenance and evolution. Several studies with human subjects have investigated how different factors, such as the employed programming constructs and naming conventions, can impact code readability, i.e., what makes a program easier or harder to read and apprehend by developers, and code legibility, i.e., what influences the ease of identifying elements of a program. These studies evaluate readability and legibility by means of different comprehension tasks and response variables. In this paper, we examine these tasks and variables in studies that compare programming constructs, coding idioms, naming conventions, and formatting guidelines, e.g., recursive vs. iterative code. To that end, we have conducted a systematic literature review where we found 54 relevant papers. Most of these studies evaluate code readability and legibility by measuring the correctness of the subjects' results (83.3%) or simply asking their opinions (55.6%). Some studies (16.7%) rely exclusively on the latter variable.There are still few studies that monitor subjects' physical signs, such as brain activation regions (5%). Moreover, our study shows that some variables are multi-faceted. For instance, correctness can be measured as the ability to predict the output of a program, answer questions about its behavior, or recall parts of it. These results make it clear that different evaluation approaches require different competencies from subjects, e.g., tracing the program vs. summarizing its goal vs. memorizing its text. To assist researchers in the design of new studies and improve our comprehension of existing ones, we model program comprehension as a learning activity by adapting a preexisting learning taxonomy. This adaptation indicates that some competencies are often exercised in these evaluations whereas others are rarely targeted.
翻译:阅读代码是软件维护和进化中的一项基本活动。 几项与人类主题有关的研究调查了不同因素如何影响代码可读性, 例如,使用的程序构建和命名惯例等不同因素如何影响代码可读性, 即,是什么使得程序更容易或更难地被开发者阅读和逮捕, 以及代码可读性, 即哪些影响识别程序要素的容易程度。 这些研究通过不同的理解任务和反应变量评估可读性和可读性。 在本文件中, 我们研究这些任务和变量是如何比较编程结构、 编译特征、 命名公约和格式化指南等不同因素的, 例如, 循环和代代用代码。 为此, 我们进行了系统化的文献审查, 我们找到了54份相关文件。 大部分这些研究通过测量主题结果的正确性来评估数据的正确性, (83.3%) 或只是询问他们的意见(55.6%) 。 一些研究( ) 完全依赖后一种模式。 仍然很少有研究来监测主题的物理迹象, 例如大脑激活区域( 5% ) 。 此外, 我们的研究显示, 一些变量是多面的变量是,,, 其 排序, 选择 选择 的 选择 的, 选择 选择 选择 选择 选择 的 选择 的 的 。 的 的 的 。