Background: Testing and validation of the semantic correctness of patches provided by tools for Automated Program Repairs (APR) has received a lot of attention. Yet, the eventual acceptance or rejection of suggested patches for real world projects by humans patch reviewers has received a limited attention. Objective: To address this issue, we plan to investigate whether (possibly incorrect) security patches suggested by APR tools are recognized by human reviewers. We also want to investigate whether knowing that a patch was produced by an allegedly specialized tool does change the decision of human reviewers. Method: In the first phase, using a balanced design, we propose to human reviewers a combination of patches proposed by APR tools for different vulnerabilities and ask reviewers to adopt or reject the proposed patches. In the second phase, we tell participants that some of the proposed patches were generated by security specialized tools (even if the tool was actually a `normal' APR tool) and measure whether the human reviewers would change their decision to adopt or reject a patch. Limitations: The experiment will be conducted in an academic setting, and to maintain power, it will focus on a limited sample of popular APR tools and popular vulnerability types.
翻译:目标:为了解决这一问题,我们计划调查人类审查者是否承认(可能不正确)由自动方案修理工具提供的补丁的语义正确性;我们还想调查了解据称由专门工具产生的补丁是否确实改变了人类审查者的决定。 方法:在第一阶段,使用平衡的设计,我们建议人类审查者综合使用由同行审查工具提出的不同脆弱性的补丁,请审查者采纳或拒绝拟议的补丁。在第二阶段,我们告诉与会者,拟议的补丁中有些是安全专门工具产生的(即使该工具实际上是一个`正常'的补丁工具),并衡量人类审查者是否会改变其采用或拒绝补丁的决定。 限制:试验将在学术环境中进行,并为了保持权力,它将侧重于有限的普通的亚太区域工具样本和民众脆弱性类型。