Mining Software Repositories (MSRs) is an evidence-based methodology that cross-links data to uncover actionable information about software systems. Empirical studies in software engineering often leverage MSR techniques as they allow researchers to unveil issues and flaws in software development so as to analyse the different factors contributing to them. Hence, counting on fine-grained information about the repositories and sources being mined (e.g., server names, and contributors' identities) is essential for the reproducibility and transparency of MSR studies. However, this can also introduce threats to participants' privacy as their identities may be linked to flawed/sub-optimal programming practices (e.g., code smells, improper documentation), or vice-versa. Moreover, this can be extensible to close collaborators and community members resulting "guilty by association". This position paper aims to start a discussion about indirect participation in MSRs investigations, the dichotomy of 'privacy vs. utility' regarding sharing non-aggregated data, and its effects on privacy restrictions and ethical considerations for participant involvement.
翻译:采矿软件储存库(MSRs)是一种基于证据的方法,将数据交叉连接,以发现软件系统方面可采取行动的信息。软件工程的经验研究往往利用MSR技术,使研究人员能够揭露软件开发方面的问题和缺陷,从而分析造成这些问题和缺陷的不同因素。因此,依靠关于储存库和来源的精细资料(例如服务器名称和贡献者的身份),对于再生和透明地进行MSR研究至关重要。然而,这也可能对参与者的隐私造成威胁,因为他们的身份可能与有缺陷/次最佳的编程做法(例如代码气味、文件不适当)或反之而有联系。此外,这可以让密切的合作者和社区成员存在,从而导致“关联担保”。 这份立场文件旨在开始讨论间接参与MSRs调查、在分享非汇总数据方面“隐私与效用”的二分法及其对隐私限制和参与者参与的道德考虑的影响。