Statistical analysis is the tool of choice to turn data into information, and then information into empirical knowledge. To be valid, the process that goes from data to knowledge should be supported by detailed, rigorous guidelines, which help ferret out issues with the data or model, and lead to qualified results that strike a reasonable balance between generality and practical relevance. Such guidelines are being developed by statisticians to support the latest techniques for Bayesian data analysis. In this article, we frame these guidelines in a way that is apt to empirical research in software engineering. To demonstrate the guidelines in practice, we apply them to reanalyze a GitHub dataset about code quality in different programming languages. The dataset's original analysis (Ray et al., 2014) and a critical reanalysis (Berger at al., 2019) have attracted considerable attention -- in no small part because they target a topic (the impact of different programming languages) on which strong opinions abound. The goals of our reanalysis are largely orthogonal to this previous work, as we are concerned with demonstrating, on data in an interesting domain, how to build a principled Bayesian data analysis and to showcase some of its benefits. In the process, we will also shed light on some critical aspects of the analyzed data and of the relationship between programming languages and code quality. The high-level conclusions of our exercise will be that Bayesian statistical techniques can be applied to analyze software engineering data in a way that is principled, flexible, and leads to convincing results that inform the state of the art while highlighting the boundaries of its validity. The guidelines can support building solid statistical analyses and connecting their results, and hence help buttress continued progress in empirical software engineering research.
翻译:统计分析是将数据转化为信息、然后将信息转化为经验知识的首选工具。 数据分析是将数据转化为信息、然后将信息转化为信息转化为经验知识的首选工具。 要做到有效,从数据到知识的过程应当得到详细、严格的指南的支持,这些指南有助于揭示数据或模型的问题,并导致在一般性和实际相关性之间取得合理平衡的合格结果。 统计人员正在制定这些指南,以支持贝叶斯语数据分析的最新技术。 在本条中,我们制定这些指南的方式适合于软件工程实验研究。 为了在实践中证明这些指南的有效性,我们应用这些指南来重新分析关于不同编程语言的代码质量的GitHub数据集。 数据集的原始分析(Ray等人,2014年)和重要的重新分析结果(Berger等人,2019年)已经引起人们的极大关注 -- -- 因为它们针对一个主题(不同编程语言的影响),并有大量的意见。 我们的重新分析目标在很大程度上是灵活的,正如我们所关心的那样,在一个有趣的域里展示如何建立关于数据质量的GitHius 数据库, 如何在不断构建一个不断构建一种对贝斯语级数据进行推介路的系统分析结果, 以及我们的数据分析的系统分析,从而在高层次分析数据分析中将产生某种数据分析结果分析, 将使得我们的数据在高层次关系,我们的数据分析将获得某种数据分析结果分析结果分析结果分析,我们的数据分析将使得在高层次分析会得到某种分析。